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SECRETED HUMAN PROTEINS 



This application claims the benefit of copending provisional application 
Serial No. 60/032,757, filed December 11, 1996, which is incorporated herein by 
reference. 



UCAh AREA THE INVENTION 

The invention relates to the area of proteins. More particularly, the 
invention relates to human secreted proteins. 



BACKGROU ND OF THE INVENTION 

Secreted proteins include such important proteins as growth factors, 
cytokines and their receptors, extracellular matrix proteins, and proteases. 
Nucleotide sequences encoding these proteins can be used to detect disease states in 
which such proteins are implicated and to develop therapeutics for such diseases. 
Thus, there is a need in the art for methods of identifying secreted proteins and the 
nucleotide sequences which encode them. 

SUMMARY OF THE INVENTION 

It is an object of the invention to provide an isolated and purified human 
protein. 

It is yet another object of the invention to provide a fusion protein. 
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It is still another object of the invention to provide a preparation of 
antibodies. 

It is even another object of the invention to provide an isolated and purified 
subgenomic polynucleotide. 

It is yet another object of the invention to provide an isolated gene. 

It is a further object of the invention to provide a DNA construct for 
expressing all or a portion of a human protein. 

It is still another object of the invention to provide a host cell comprising a 
DNA construct. 

It is another object of the invention to provide a homologously recombinant 

cell. 

It is even another object of the invention to provide a method of producing a 
human protein. 

It is another object of the invention to provide a method of identifying a 
secreted polypeptide which is modified by rough microsomes. 

These and other objects of the invention are provided by one or more of the 
embodiments described below. 

One embodiment of the invention provides an isolated and purified human 
protein. The isolated and purified human protein has an amino acid sequence 
selected from the group consisting of the amino acid sequences shown in SEQ ID 
Nos:20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, and 38. 

Another embodiment of the invention provides an isolated and purified 
human protein having an amino acid sequence which is at least 85% identical to an 
amino acid sequence selected from the group consisting of the amino acid 
sequences shown in SEQ ID Nos:20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 
33,34,35,36, 37, and 38. 

Still another embodiment of the invention provides a polypeptide comprising 

at least 6 contiguous amino acids of an amino acid sequence selected from the 

> 

group consisting of the amino acid sequences shown in SEQ ID Nos:20, 21, 22, 23, 
24, 25, 26, 27, 28, 29, 30, 3 1, 32, 33, 34, 35, 36, 37, and 38. 
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Even another embodiment of the invention provides a fusion protein. The 
fusion protein comprises a first protein segment and a second protein segment fused 
together by means of a peptide bond. The first protein segment consists of at least 
6 contiguous amino acids selected from the group consisting of the amino acid 
sequences shown in SEQ ID Nos:20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 
33, 34, 35, 36, 37, and 38. 

Yet another embodiment of the invention provides a preparation of 
antibodies. The antibodies specifically bind to a human protein having an amino 
acid sequence selected from the group consisting of the amino acid sequences 
shown in SEQ ID Nos:20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 
36, 37, and 38. 

Even another embodiment of the invention provides an isolated and purified 
subgenomic polynucleotide. The isolated and purified subgenomic polynucleotide 
has a nucleotide sequence selected from the group consisting of the nucleotide 
sequences shown in SEQIDNOs:l, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 
17, 18, and 19. 

Yet another embodiment of the invention provides an isolated and purified 
subgenomic polynucleotide consisting of at least 10 contiguous nucleotides selected 
from the group consisting of the nucleotide sequences shown in SEQ ID NOs:l, 2, 
3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, and 19. 

Still another embodiment of the invention provides an isolated gene. The 
isolated gene corresponds to a cDNA sequence selected from the group consisting 
of the nucleotide sequences shown in SEQ ID NOs:l, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 
12, 13, 14, 15, 16, 17, 18, and 19. 

Another embodiment of the invention provides a DNA construct for 
expressing all or a portion of a human protein. The DNA construct comprises a 
promoter and a polynucleotide segment. The polynucleotide segment encodes at 
least 6 contiguous amino acids of a human protein having an amino acid sequence 
selected from the group consisting of the amino acid sequences shown in SEQ ID 
Nos:20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, and 38. 
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The polynucleotide segment is located downstream from the promoter. 
Transcription of the polynucleotide segment initiates at the promoter. 

Even another embodiment of the invention provides a host cell comprising a 
DNA construct. The DNA construct comprises a promoter and a polynucleotide 
segment. The polynucleotide segment encodes at least 6 contiguous amino acids of 
a human protein having an amino acid sequence selected from the group consisting 
of the amino acid sequences shown in SEQ ID Nos:20, 21, 22, 23, 24, 25, 26, 27, 
28, 29, 30, 31, 32, 33, 34, 35, 36, 37, and 38. The polynucleotide segment is 
located downstream from the promoter. Transcription of the polynucleotide 
segment initiates at the promoter. 

Still another embodiment of the invention provides a homologously 
recombinant cell having incorporated therein a new transcription initiation unit. The 
transcription initiation unit comprises in 5' to 3' order an exogenous regulatory 
sequence, an exogenous exon, and a splice donor site. The transcription initiation 
unit is located upstream to a coding sequence of a gene. The gene comprises a 
nucleotide sequence selected from the group consisting of the nucleotide sequences 
shown in SEQIDNOs:l, 2, 3,4,5, 6, 7, 8,9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 
and 19. The exogenous regulatory sequence controls transcription of the coding 
sequence of the gene. 

Yet another embodiment of the invention provides a method of producing a 
human protein. A culture of a cell is grown. The cell comprises a DNA construct. 
The DNA construct comprises a promoter and a polynucleotide segment. The 
polynucleotide segment encodes at least 6 contiguous amino acids of a human 
protein having an amino acid sequence selected from the group consisting of the 
amino acid sequences shown in SEQ ID Nos:20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 
30, 31, 32, 33, 34, 35, 36, 37, and 38. The polynucleotide segment is located 
downstream from the promoter. Transcription of the polynucleotide segment 
initiates at the promoter. The protein is purified from the culture. 

Even another embodiment of the invention provides a method of producing 
a human protein. A culture of a cell is grown. The cell comprises a new 
transcription initiation unit. The transcription initiation unit comprises in 5' to 3' 
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order an exogenous regulatory sequence, an exogenous exon, and a splice donor 
site. The transcription initiation unit is located upstream to a coding sequence of a 
gene. The gene comprises a nucleotide sequence selected from the group consisting 
of the nucleotide sequences shown m SEQ ED NOs:l, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 
12, 13, 14, 15, 16, 17, 18, and 19. The exogenous regulatory sequence controls 
transcription of the coding sequence of the gene. The protein is purified from the 
culture. 

Another embodiment of the invention provides a method of identifying a 
secreted polypeptide which is modified by rough microsomes. A population of 
cDNA molecules is transcribed in vitro whereby a population of cRNA molecules is 
formed. A first portion of the population of cRNA molecules is translated in vitro 
in the absence of rough microsomes whereby a first population of polypeptides is 
formed. A second portion of the population of cRNA molecules is translated in 
vitro in the presence of rough microsomes whereby a second population of 
polypeptides is formed. The first population of polypeptides is compared with the 
second population of polypeptides. Polypeptide members of the second population 
which have been modified by the rough microsomes are detected. 

The present invention thus provides the art with a method for identifying 
secreted proteins or polypeptides, the amino acid sequences of nineteen novel 
human secreted proteins, and the nucleotide sequences which encode these proteins. 
The invention can be used to, inter alia, to produce secreted proteins for 
therapeutic and diagnostic purposes. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

The inventors have discovered a method for identifying secreted proteins or 
polypeptides. Secreted proteins or polypeptides include soluble proteins which can 
be transported across a membrane, such as a cell membrane, nuclear membrane, or 
membrane of the endoplasmic reticulum, as well as proteins which can be partially 
secreted from a cell, such as membrane-bound receptors. 

Secreted proteins can contain a signal (or secretion leader) sequence, 
located at the N-terminus and including at least several hydrophobic amino acids, 
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such as phenylalanine, methionine, leucine, valine, or tryptophan. Non-hydrophobic 
amino acids can also be included in the signal sequence. Signal sequences are 
described in von Heijne, J. Mol Biol 184:99-105 (1985) and Kaiser and Botstein, 
Mol Cell Biol 5:2382-2391 (1986). Secreted proteins can also be glycosylated by 
post-translational modification. The presence of a signal sequence or the presence 
of glycosylation or both indicate that a particular protein is a secreted protein. 

In order to identify secreted proteins or polypeptides, the method of the 
invention exploits properties of microsomes, which are the closed vesicles that 
result from fragmentation of endoplasmic reticulum. Microsomes can be rough or 
smooth, depending on whether the endoplasmic reticulum from which they were 
derived is studded with ribosomes. Microsomes, particularly rough microsomes, 
have the ability to perform post-translational modifications, such as glycosylation 
and cleavage of signal sequences from proteins or polypeptides. 

To identify secreted proteins, a population of complementary DNA(cDNA) 
molecules is transcribed in vitro to synthesize a population of complementary RNA 
(cRNA) molecules. The cDNA molecules can be synthesized by reverse 
transcription of mRNA molecules isolated from a particular cell or tissue type or 
organism using, for example, a commercially available reverse transcriptase enzyme. 
Alternatively, the reverse transcription reaction to form cDNA molecules can be 
conducted on total RNA, without a preliminary purification of mRNA 

Any organism, such as a bacterium, plant, invertebrate, or vertebrate 
organism, can be used as a source of RNA. Particularly preferred sources of RNA 
are mammals, most preferably humans. Tissues, such as liver, brain, kidney, spleen, 
pancreas, or muscle, can be used as a source of RNA. Individual cell types, either 
primary cells or members of established cell lines, such as HeLa, CHO, PC12, P19, 
BHK, COS, or HepG2, are suitable sources of RNA. Tissues or primary cells 
isolated from organisms at a particular stage in development can be used as RNA 
sources. Stem cells, such as hematopoietic, neuronal, and embryonic stem cells, can 
also be used as a source of RNA. 

Total RNA or mRNA can be isolated using methods known in the art. Such 
methods are described, inter alia, in Sambrook etal, Molecular Cloning, A 
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Laboratory Manual (2d ed., Cold Spring Harbor Press, N.Y, 1989), and 
Ausubel et al , Current Protocols in Molecular Biology (Greene Publishing 
Associates and John Wiley & Sons, N.Y., 1994). Techniques for RNA isolation 
can be tailored for a particular organism or cell type, as is known in the art. 

Complementary DNA can optionally be obtained from a cDNA library. The 
cDNA library can be derived from the genome of any organism of interest, 
particularly a mammal or a human. Tissue- or cell type-specific cDNA libraries can 
also be used as a source of cDNA. 

Transcription of cDNA molecules in vitro to form cRNA molecules can be 
carried out using any methods known in the art. These methods include, for 
example, placing cDNA into a cloning vector containing a promoter, such as an 
SP6, T7, or T3 polymerase promoter, and transcribing the cDNA using the 
appropriate polymerase. A variety of commercial kits are available for this purpose. 

A first portion of the population of cRNA molecules can be translated in 
vitro., in the absence of rough microsomes, to form a first population of 
polypeptides which have not been post-translationally modified. A second portion 
of the population of cRNA molecules can be translated in vitro in the presence of 
rough microsomes. Under the conditions of the in vitro translation reaction, rough 
microsomes can cleave signal sequences from those polypeptides which comprise 
such sequences. Under the same conditions, rough microsomes can also glycosylate 
those polypeptides which contain glycosylation sites. 

Methods of in vitro translation are those which are known in the art, such 
as translation in a reticulocyte iysate system, particularly a rabbit reticulocyte lysate. 
Reticulocyte lysate systems can be assembled in the laboratory or purchased 
commercially in kit form. 

Microsomes can be prepared by disruption of tissues or cells by 
homogenization, as is known in the art. If desired, rough and smooth microsomes 
can be separated using well-known techniques, such as sucrose density gradient 
sedimentation. Microsomes are also available commercially, for example, such as 
the canine pancreatic microsomes available from Promega Corp., Madison, WL 
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The first population of polypeptides can then be compared with the second 
population of polypeptides. This comparison can be by means of, for example, one- 
or two-dimensional polyacrylamide gel electrophoresis, as is known in the art. 
Polypeptides separated in the gels can be detected by any means known in the art, 
5 such as staining with copper, silver, Coomassie Brilliant Blue, amido black, fast 

green FCF, Ponceau S, or a chromophoric label. Separated proteins can also be 
visualized using radioactive, chemiluminescent, fluorescent, or enzymatic tags 
incorporated into the proteins before separation. 

The gels can be dried or the proteins can be transferred to membranes, such 

10 as polyvinylidene difluoride membranes. Either the gels or membranes themselves 

or photographs of the gels or membranes can be compared by eye. Alternatively, 
the gels or membranes can be scanned, for example, with a densitometer and 
analyzed with the aid of a computer. 

Polypeptide members of the second population of polypeptides, which have 

15 been modified by the rough microsomes, can be detected by any means available in 

the art. For example, a shift in the position of a polypeptide band can be observed, 
indicating an increase in molecular weight of a member of the second population 
compared with the corresponding polypeptide member of the first population. Such 
an increase in molecular weight indicates that the polypeptide member of the second 

20 population was glycosylated by the rough microsomes. 

A shift in the position of a polypeptide band indicating a decrease in 
molecular weight of a member of the second population compared with the 
corresponding polypeptide member of the first population can also be observed. 
This decrease in molecular weight indicates that the polypeptide member of the 

25 second population contained a signal sequence which was cleaved by the rough 

microsomes. 

Polypeptides which are modified by the rough microsomes are identified as 
secreted polypeptides. Optionally, quantities of cDNA molecules which encode 
secreted polypeptides can be obtained. Molecules of cDNA which encode 
30 polypeptides which are post-translationally modified by the rough microsomes can 

be placed into suitable vectors using standard recombinant DNA techniques and 
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used to transform host cells. Many vectors are available for this purpose, such as 
retroviral or adenoviral vectors and bacteriophage, as described below. 

Vectors comprising cDNA which encode secreted polypeptides can be 
introduced into host cells using techniques available in the art. These techniques 
include, but are not limited to, transferrin-polycation-mediated DNA transfer, 
transfection with naked or encapsulated nucleic acids, liposome-mediated cellular 
fusion, intracellular transportation of DNA-coated latex beads, protoplast fusion, 
viral infection, electroporation, and calcium phosphate-mediated transfection. 

The host cells can be any host cells which are capable of propagating cDNA 
molecules. A variety of host cells, for example immortalized cell lines such as 
HeLa, CHO, or HEK, are available for this purpose. 

Transformed host cells can be diluted serially and cultured to form individual 
colonies. Methods of culturing host cells and the media suitable for each host cell 
type are well known in the art. Preferably, each colony originates from a single 
transformed host cell Separate preparations of cDNA from each colony can be 
prepared, as described above, and transcribed in vitro to form cRNA. The cRNA 
can be transcribed to form secreted polypeptides, which can be purified as is known 
in the art. If the preparation of secreted polypeptides from a colony contains more 
than one species of polypeptide, the steps described above can be repeated until a 
colony is obtained which contains cDNA encoding only a single species of 
polypeptide. 

Complementary DNA molecules which encode secreted proteins can be 
sequenced using standard nucleotide sequencing techniques. The sequence of each 
cDNA molecule can be compared with known sequences in a database to determine 
whether the clone encodes a known or a novel secreted protein. 

The inventors have used the method of the invention to identify nineteen 
novel human secreted proteins. Amino acid sequences for these nineteen human 
secreted proteins are disclosed in SEQ IDNos:20, 21, 22, 23, 24, 25, 26, 27, 28, 
29, 30, 31, 32, 33, 34, 35, 36, 37, and 38. Nucleotide sequences which encode the 
proteins are disclosed in SEQ IDNOs:l, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 
15, 16, 17, 18, and 19, respectively. 
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Clones containing the cDNAs of the secreted proteins were deposited on 
December 1 1, 1997, with the ATCC Individual bacterial cells (£. coli) in this 
composite deposit contain one or more of the polynucleotides encoding the secreted 
proteins of the invention and can be retrieved using an oligonucleotide probe 
designed from the sequence for that particular polynucleotide, as provided herein. 
Each polynucleotide can be removed from the vector by performing an EcoRI/NotI 
digestion (5* site, EcoRI; 3' site, NotI). The deposit submitted to the ATCC has 
been designated SECP 120997. The nucleotide sequences of these deposits and the 
amino acid sequences they encode are controlling in the event of a discrepancy 
between the amino acid and nucleotide sequences disclosed herein and those 
contained in the deposits. 

A purified and isolated subgenomic polynucleotide of the present invention 
comprises at least 10, 12, 15, 18, 20, 25, 30, 35, 40, 45, or 50 contiguous 
nucleotides selected from the nucleotide sequences shown in SEQ ID NOs:l, 2, 3, 
4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, and 19. The isolated and purified 
subgenomic polynucleotides can comprise an entire nucleotide sequence selected 
from the nucleotide sequences shown in SEQ ID NOs:l, 2 9 3, 4, 5, 6, 7, 8, 9, 10, 
11, 12, 13, 14, 15, 16, 17, 18, and 19. 

Subgenomic polynucleotides contain less than a whole chromosome and are 
preferably intron-free. Polynucleotides of the invention can be isolated and purified 
free from other nucleotide sequences by standard nucleic acid purification 
techniques, using restriction enzymes and probes to isolate fragments comprising 
the coding sequences. 

Isolated genes corresponding to the cDNA sequences disclosed herein are 
also provided. Known methods can be used to isolate the corresponding genes 
using the provided cDNA sequences. These methods include preparation of probes 
or primers from the nucleotide sequences shown in SEQ ID NOs:l, 2, 3, 4, 5, 6, 7, 
8, 9, 10, 11, 12, 13, 14, 15, 16, J 7, 18, and 19 for use in identifying or amplifying 
the genes from human genomic libraries or other sources of human genomic DNA. 

The coding sequences shown in SEQ ID NOs:l, 2, 3, 4, 5, 6, 7, 8, 9, 10, 
11, 12, 13, 14, 15, 16, 17, 18, and 19 can be made using reverse transcriptase with 

10 
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human mRNA as a template. Amplification by PCR can also be used to obtain the 
polynucleotides, using either genomic DNA or cDNA as a template. Polynucleotide 
molecules of the invention can also be made using the techniques of synthetic 
chemistry given the sequences disclosed herein. The degeneracy of the genetic code 
permits alternate nucleotide sequences which will encode the amino acid sequences 
shown in SEQIDNos:20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 
36, 37, and 38 to be synthesized. All such nucleotide sequences are within the 
scope of the present invention. 

Polynucleotide molecules of the invention can be propagated in vectors and 
cell lines as is known in the art. Polynucleotide molecules can be on linear or 
circular molecules. They can be on autonomously replicating molecules or on 
molecules without replication sequences. For propagation, polynucleotides of the 
invention can be introduced into suitable host cells using any techniques available in 
the art, as described above. 

Subgenomic polynucleotides of the invention can be used to propagate 
additional copies of the polynucleotides or to express protein, polypeptides, or 
fusion proteins. The subgenomic polynucleotides disclosed herein can also be used, 
for example, as biomarkers for tissues or chromosomes, as molecular weight 
markers for DNA gels, to elicit immune responses, such as the formation of 
antibodies against single- or double-stranded DNA, and in DNA-ligand interaction 
assays, to detect proteins or other molecules which interact with the nucleotide 
sequences. 

Disease states may be associated with alterations in the expression of genes 
which encode proteins of the invention. Polynucleotide sequences disclosed herein 
can also be used to determine the involvement of any of these sequences in disease 
states. For example, a gene in a diseased cell can be sequenced and compared with 
a wild-type coding sequence of the invention. Alternatively, nucleotide probes can 
be constructed and used to detefct normal or altered (mutant) forms of mRNA in a 
diseased cell. Subgenomic polynucleotides of the invention can also be used to 
design diagnostic tests and therapeutic compositions for diseases which may be 
associated with altered expression of these genes. 



11 



WO 98/25959 



PCT/US97/22787 



The present invention provides both full-length and mature forms of the 
disclosed proteins. Full-length forms of the proteins have the amino acid sequences 
shown in SEQ ID Nos:20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 3 1, 32, 33, 34, 35, 
36, 37, and 38. The full-length forms of a protein can be processed enzymatically 
5 to remove a signal sequence, resulting in a mature form of the protein. Signal 

sequences can be identified by examination of the amino acid sequences disclosed 
herein and comparison with amino acid sequences of known signal sequences (see, 
e.g., von Heijne, 1985; Kaiser & Botstein, 1986). Similarly, transmembrane 
domains can be identified by examination of the amino acid sequences disclosed 

10 herein. A transmembrane domain typically contains a long stretch of 1 5-30 

hydrophobic amino acids. 

Other domains with predicted functions can also be identified. For example, 
the protein having the amino acid sequence shown in SEQ ID NO:23 comprises a 
Kunitz type serine protease inhibitor domain spanning amino acids 68 to 122 of 

15 SEQ ID NO:23. The protein having the amino acid sequence shown in SEQ ID 

NO:20 contains a zinc-finger motif 

Allelic variants of the disclosed subgenomic polynucleotides can occur and 
encode proteins which are identical, homologous, or substantially related to amino 
acid sequences disclosed herein (see below). 

20 Allelic variants of subgenomic polynucleotides of the invention can be 

identified by hybridization of putative allelic variants with nucleotide sequences 
disclosed herein under stringent conditions. For example, by using the following 
wash conditions-2 x SCC, 0.1% SDS, room temperature twice, 30 minutes each; 
then 2 x SCC, 0.1% SDS, 50 °C. once, 30 minutes; then 2 x SCC, room 

25 temperature twice, 10 minutes each—allelic variants can be identified which contain 

at most about 25-30% basepair mismatches. More preferably, allelic variants 
contain 15-25% basepair mismatches, even more preferably 5-15% basepair 
mismatches. * 

Protein variants of secreted proteins of the invention are also included. 

30 Amino acids which are not involved in regions which determine biological activity 

can be deleted or modified without affecting biological function. Preferably, protein 

12 
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variants of the invention have amino acid sequences which are at least 85%, 90%, 
or 95% identical to the amino acid sequences disclosed herein and have similar 
biological properties (see below). More preferably, the molecules are 98% 
identical Modifications of interest in the protein sequences car include the 
alteration, substitution, replacement, insertion or deletion of a selected amino acid 
residue. Proteins or derivatives can be either glycosylated or unglycosylated. 
Techniques for making such modifications are well known to those skilled in the art 
{see, e.g., U.S. 4,518,584). Alternatively, variants of proteins disclosed herein can 
be constructed using techniques of synthetic chemistry or using recombinant DNA 
methods. 

Preferably, amino acid changes in variants or derivatives of proteins of the 
invention are conservative amino acid changes, i.e., substitutions of similarly 
charged or uncharged amino acids. A conservative amino acid change involves 
substitution of one amino acid for another amino acid of a family of amino acids 
which are structurally related in their side chains. Naturally occurring amino acids 
are generally divided into four families: acidic (aspartate, glutamate), basic (lysine, 
arginine, histidine), non-polar (alanine, valine, leucine, isoleucine, proline, 
phenylalanine, methionine, tryptophan), and uncharged polar (glycine, asparagine, 
glutamine, cystine, serine, threonine, tyrosine) amino acids. Phenylalanine, 
tryptophan, and tyrosine are sometimes classified as aromatic amino acids. It is 
reasonable to expect that an isolated replacement of a leucine with an isoleucine or 
valine, an aspartate with a glutamate, a threonine with a serine, or a similar 
replacement of an amino acid with a structurally related amino acid will not have a 
major effect on the binding properties of the resulting molecule, especially if the 
replacement does not involve an amino acid at a binding site involved in an 
interaction of the protein. Non-naturally occurring amino acids can also be used to 
form protein variants of the invention. 

Whether an amino acid change results in a functional protein or polypeptide 
can readily be determined by assaying biological properties of the disclosed proteins 
or polypeptides, as described below. Species homologs of human subgenomic 
polynucleotides and proteins of the invention can also be identified by making 
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suitable probes or primers and screening cDNA expression libraries from other 
species, such as mice, monkeys, yeast, or bacteria. 

In the case of proteins which are membrane-bound, such as cell surface 
receptor proteins, soluble forms of the proteins can be obtained by deleting the 
nucleotide sequences which encode part or all of the intracellular and 
transmembrane domains of the protein and expressing a fully secreted form of the 
protein in a host cell. Techniques for identifying intracellular and transmembrane 
domains, such as homology searches, can be used to identify such domains in 
proteins of the invention using amino acid and nucleotide sequences disclosed 
herein. 

Polypeptides consisting of less than full-length proteins of the present 
invention are also provided. Polypeptides of the invention can be linear or can be 
cyclized, for example, as described in Saragovi et aL, 1992, Bio/Technology 10 y 
773-778 and McDowell et al y 1992, J. Amer. Chem. Soc. J14, 9245-9253. 
Polypeptides can be used, for example, as immunogens, diagnostic aids, or 
therapeutics, and to create fusion proteins, as described below. 

Polypeptide molecules consisting of less than the entire amino acid 
sequences shown in SEQIDNos:20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 
33, 34, 35, 36, 37, and 38 are also provided. Such polypeptides comprise at least 6, 
8, 10, 12, 15, 18, or 20 contiguous amino acids of an amino acid sequence shown in 
SEQ ID Nos:20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 3 1, 32, 33, 34, 35, 36, 37, 
and 38. Polypeptide molecules of the invention can also possess minor amino acid 
alterations which do not substantially affect the ability of the polypeptides to 
interact with specific molecules, such as antibodies. 

Derivatives of the polypeptides, such as glycosylated forms, aggregative 
conjugates with other molecules, and covalent conjugates with unrelated chemical 
moieties, are also provided. Derivatives also include allelic variants, species 
variants, and muteins. Covalent ^derivatives are prepared by linkage of 
functionalities to groups which are found in the amino acid chain or at the N- or C- 
terminal residue by means known in the art. Truncations or deletions of regions 
which do not affect biological function are also encompassed. Truncated or deleted 
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polypeptides can be prepared synthetically or recombinantly, or by proteolytic 
digestion of purified or partially purified secreted proteins of the invention. 

Fusion proteins comprising at least 6, 8, 10, 12, 15, 18, or 20 contiguous 
amino acids of the disclosed proteins can also be constructed. Human fusion 
proteins are useful, inter alia, for generating antibodies against amino acid 
sequences and for use in various assay systems. For example, fusion proteins can 
be used to identify proteins which interact with secreted proteins of the invention 
and influence their function. Physical methods, such as protein affinity 
chromatography, or library-based assays for protein-protein interactions, such as the 
yeast two-hybrid or phage display systems, can be used for this purpose. Such 
methods are well known in the art and can also be used as drug screens. Fusion 
proteins can also be used to target molecules to a specific location in a cell or to 
cause a molecule to be secreted or to be anchored in a cellular membrane. 

Fusion proteins of the invention comprise two protein segments which are 
fused together with a peptide bond. The first protein segment comprises at least 6, 
8, 10, 12, 15, 18, or 20 contiguous amino acids selected from an amino acid 
sequence shown in SEQ ID Nos:20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 
33, 34, 35, 36, 37, and 38. The first protein segment can also be a fulMength 
protein (comprising a signal sequence) or a mature protein (lacking a signal 
sequence). The second protein segment can be a fulMength protein or a protein 
fragment. The second protein or protein fragment can be labeled with a detectable 
marker, such as a radioactive, chemiluminescent, biotinylated, or fluorescent tag, or 
can be an enzyme which will generate a detectable product. Enzymes suitable for 
this purpose, such as p-galactosidase, are well known in the art. 

Techniques for making fusion proteins, either recombinantly or by 
covalently linking two protein segments, are well known in the art. Fusion proteins 
comprising amino acid sequences of the invention can also be constructed, for 
example, using standard recombinant DNA methods to make a DNA construct 
which comprises contiguous nucleotides selected from SEQ ID NOs:l, 2, 3, 4, 5, 6, 
7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, and 19 and encoding the desired amino 
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acids in proper reading frame with nucleotides encoding the second protein 
segment. 

Proteins or polypeptides of the invention can be purified free from other 
components with which they are normally associated in a cell, such as 
carbohydrates, lipids, subcellular organelles, or other proteins. An isolated protein 
or polypeptide is at least 90% pure. Preferably, the preparations are 95% or 99% 
pure. The purity of a preparation can be assessed, for example, by examining 
electrophoretograms of protein or polypeptide preparations at several pH values 
and at several polyacrylamide concentrations, as is known in the art. 

Standard biochemical methods can be used to isolate proteins of the 
invention from tissues which express the proteins or to isolate proteins, 
polypeptides, or fusion proteins from recombinant host cells into which a DNA 
construct has been introduced. Methods of protein purification, such as size 
exclusion chromatography, ammonium sulfate fractionation, ion exchange 
chromatography, affinity chromatography, crystallization, electrofocusing, or 
preparative gel electrophoresis, are well known and widely used in the art. 

Alternatively, proteins, fusion proteins, or polypeptides of the invention can 
be produced by recombinant DNA methods or by synthetic chemical methods. 
Synthetic chemistry methods, such as solid phase peptide synthesis, can be used to 
synthesize proteins, fusion proteins, or polypeptides. For production of 
recombinant proteins, fusion proteins, or polypeptides, coding sequences selected 
from the nucleotide sequences shown in SEQ ID NOs: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 
11, 12, 13, 14, 15, 16, 17, 18, and 19 can be expressed in prokaryotic or eukaryotic 
host cells using expression systems known in the art. These expression systems 
include bacterial, yeast, insect, and mammalian cells (see below). 

The resulting expressed protein can then be purified from the culture 
medium or from extracts of the cultured cells using purification procedures known 
in the art. For example, for proteins fully secreted into the culture medium, cell-free 
medium can be diluted with sodium acetate and contacted with a cation exchange 
resin, followed by hydrophobic interaction chromatography. Using this method, the 
desired protein, fusion protein, or polypeptide is typically greater than 95% pure. 
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Further purification can be undertaken, using, for example, any of the techniques 
listed above. Proteins, fusion proteins, or polypeptides can also be tagged with an 
epitope, such as a "Flag" epitope (Kodak), and purified using an antibody which 
specifically binds to that epitope. 

It may be necessary to modify a protein produced in yeast or bacteria, for 
example by phosphorylation or glycosylation of the appropriate sites, in order to 
obtain a functional protein. Such covalent attachments can be made using known 
chemical or en2ymatic methods. 

Proteins or polypeptides of the invention can also be expressed in cultured 
cells in a form which will facilitate purification. For example, a secreted protein or 
polypeptide can be expressed as a fusion protein comprising, for example, maltose 
binding protein, glutathione-S-transferase, or thioredoxin, and purified using a 
commercially available kit. Kits for expression and purification of such fusion 
proteins are available from companies such as New England BioLabs, Pharmacia, 
and Invitrogen. 

The coding sequences disclosed herein can also be used to construct 
transgenic animals, such as cows, goats, pigs, or sheep. Female transgenic animals 
can then produce proteins, polypeptides, or fusion proteins of the invention in their 
milk. Methods for constructing such animals are known and widely used in the art. 

Isolated proteins, polypeptides, or fusion proteins of the invention can be 
used to obtain a preparation of antibodies which specifically bind to epitopes 
comprising amino acid sequences of the invention. Antibodies of the invention can 
be used, for example, to detect proteins, polypeptides, or fusion proteins of the 
invention which are secreted into culture medium or to identify tissues or cells 
which express these molecules. The antibodies can be polyclonal or monoclonal or 
can be single chain antibodies. Techniques for raising polyclonal and monoclonal 
antibodies and for constructing single chain antibodies are well known in the art. 

Antibodies of the invention bind specifically to epitopes comprising amino 
acid sequences of the invention, preferably to epitopes not present on other 
proteins. Typically a minimum number of contiguous amino acids to encode an 
epitope is 6, 8, or 10. However, more amino acids can be part of an epitope, for 
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example, at least 15, 25, or 50, especially to form epitopes which involve non- 
contiguous residues. Specific binding antibodies do not detect other proteins on 
Western blots of proteins or in irnmunocytochemical assays. Specific binding 
antibodies provide a signal at least ten-fold lower than the signal provided with 
epitopes which do not comprise amino acid sequences of the invention. Antibodies 
which bind specifically to secreted proteins of the invention include those that bind 
to mature or full-length proteins, to polypeptides or degradation products, to fusion 
proteins, or to protein variants. In a preferred embodiment of the invention, the 
antibodies immunoprecipitate the desired protein, fusion protein, or polypeptide 
from solution and react with the protein, fusion protein, or polypeptide on Western 
blots of polyacrylamide gels. 

Techniques for purifying antibodies are those which are available in the art. 
In a preferred embodiment, antibodies are affinity purified by passing the antibodies 
over a column to which amino acid sequences of the invention are bound. The 
bound antibody is then eluted, for example using a buffer with a high salt 
concentration. Any such technique may be chosen to purify antibodies of the 
invention. 

The invention also provides DNA constructs, for expressing all or a portion 
of a protein of the invention in a host cell The DNA construct comprises a 
promoter which is functional in the particular host cell selected. The skilled artisan 
can readily select an appropriate promoter from the large number of cell type- 
specific promoters known and used in the art. The DNA construct can also contain 
a transcription terminator which is functional in the host cell. 

The expression construct comprises a polynucleotide segment which 
encodes all or a portion of a human protein encoded by SEQ ID NOs:l, 2, 3, 4, 5, 
6, 7, 8, 9, 10, 1 1, 12, 13, 14, 15, 16, 17, 18, and 19 or a variant thereof. The 
polynucleotide segment is located downstream from the promoter. Transcription of 
the polynucleotide segment initiates at the promoter. DNA constructs can be linear 
or circular and can contain sequences, if desired, for autonomous replication. 

The host cell comprising the DNA construct can be any suitable prokaryotic 
or eukaryotic cell. Expression systems in bacteria include those described in Chang 
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et al, Nature (1978) 275: 615; Goeddel et al, Nature (1979) 281: 544; Goeddel et 
al, Nucleic Acids Res. (1980) 8: 4057; EP 36,776; U.S. 4,551,433; deBoer etal, 
Proc. Natl Acad, Sci. USA (1983) 80: 21-25; and Siebenlist etal, Cell (1980) 20: 
269. 

Expression systems in yeast include those described in EBnnen et al, Proc. 
Natl Acad. Sci. USA (1978) 75: 1929; Ito etal, J. Bacteriol (1983) 153: 163; 
Kurtz et al, Mol Cell Biol (1986) 6: 142; Kunze et al, J. Basic Microbiol. 
(1985) 25: 141; Gleeson et al, J. Gen. Microbiol (1986) 132: 3459, Roggenkamp 
etal, Mol. Gen. Genet. (1986) 202 :302); Das etal, J. Bacteriol (1984) 158: 
1 1 65; De Louvencourt et al, J. Bacteriol (1983) 154: 737, Van den Berg et al, 
Bio/Technology (1990) 8: 135; Kunze etal, J. Basic Microbiol (1985) 25: 141; 
Cregg et al, Mol Cell Biol (1985) 5: 3376; U.S. 4,837,148; U.S. 4,929,555; 
Beach and Nurse, Nature (1981) 300: 706; Davidow etal, Curr. Genet (1985) 10: 
380; Gaillardinero/., Curr. Genet. (1985) 70: 49; Ballance etal, Biochem. 
Biophys. Res. Commun. (1983) 112: 284-289; Tilburn etal, Gene (1983) 26: 205- 
22;, Yelton etal, Proc. Natl Acad Sci. USA (1984) 81: 1470-1474; Kelly and 
Hynes, EMBOJ. (1985) 4: 475479; EP 244,234; and WO 91/00357. 

Expression of heterologous genes in insects can be accomplished as 
described in U.S. 4,745,05 1; Friesen et al. (1986) "The Regulation of Baculovirus 
Gene Expression" in: THE MOLECULAR BIOLOGY OF BACULOVIRUSES (W. Doerfler, 
ed.); EP 127,839; EP 155,476; Vlaketal, J. Gen. Virol (1988) 69: 765-776; 
Miller etal, Ann. Rev. Microbiol. (1988) 42: 177; Carbonell etal, Gene (1988) 
73: 409; Maeda etal, Nature (1985) 315: 592-594; Lebacq-Verheyden et al, Mol. 
Cell Biol. (1988) 8: 3129; Smith etal, Proc. Natl. Acad Sci. USA (1985) 82: 
8404; Miyajima etal, Gene (1987) 58: 273; and Martin et al, DNA (1988) 7;99. 
Numerous baculoviral strains and variants and corresponding permissive insect host 
cells from hosts are described in ^uckow et al, Bio/Technology (1988) 6: 47-55, 
Miller etal, in Generic Engineering (Setlow, J.K. etal eds.), Vol. 8 (Plenum 
Publishing, 1986), pp. 2.11-219; and Maeda et al, Nature, (1985) 315: 592-594. 

Mammalian expression can be accomplished as described in Dijkema et al, 
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EMBOJ. (1985) 4: 761; Gorman etal, Proc. Natl Acad Set USA (1982b) 79: 
6777;Bosharte/a/„ Cell (19S5) 41: 521; andU.S. 4,399,216. Other features of 
mammalian expression can be facilitated as described in Ham and Wallace, Meth. 
Em. (1979) 58: 44; Barnes and Sato, Anal Biochem. (1980) 102: 255; U.S. 
5 4,767,704; U.S. 4,657,866; U.S. 4,927,762; U.S. 4,560,655; WO 90/103430, WO 

87/00195, and U.S. RE 30,985. 

DNA constructs of the invention can be introduced into host cells using any 
technique known in the art. These techniques include transferrin-polycation- 
mediated DNA transfer, transfection with naked or encapsulated nucleic acids, 

10 liposome-mediated cellular fusion, intracellular transportation of DNA-coated latex 

beads, protoplast fusion, viral infection, electroporation, and calcium phosphate- 
mediated transfection. 

Alternatively, expression of an endogenous gene encoding a protein of the 
invention can be manipulated by introducing by homologous recombination a DNA 

15 construct comprising a transcription unit in frame with the endogenous gene, to 

form a homologously recombinant cell comprising the transcription unit. The 
transcription unit comprises a targeting sequence, a regulatory sequence, an exon, 
and an unpaired splice donor site. The new transcription unit can be used to turn 
the endogenous gene on or off as desired. This method of affecting endogenous 

20 gene expression is taught in U.S. 5,641,670, which is incorporated herein by 

reference. 

The targeting sequence is a segment of at least 10, 12, 15, 20, or 50 
contiguous nucleotides selected from the nucleotide sequences shown in SEQ ID 
NOs:l, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, and 19. The 
25 transcription unit is located upstream to a coding sequence of the endogenous 

gene. The exogenous regulatory sequence directs transcription of the coding 
sequence of the endogenous gene. 

Secreted proteins of the invention have a variety of uses. For example, 
secreted proteins can be used in assays to determine biological activities, such as 
30 cytokine, cell proliferation, or cellular differentiation activities, tissue growth or 
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regeneration, activin or inhibin activity, chemotactic or chemokinetic activity, 
hemostatic or thrombolytic activity, receptor/ligand activity, tumor inhibition, or 
anti-inflammatory activity. Assays for these activities are known in the art and are 
disclosed, for example, in U.S. 5,654,173, which is incorporated herein by 
reference. 

Proteins of the invention can also be used as biomarkers, to identify tissues 
or cell types which express the proteins, or a stage- or disease-specific alteration in 
protein expression. Proteins of the invention can be used in protein interaction 
assays, to identify ligands or binding proteins. Compounds which affect the 
biological activities of the secreted proteins or their ability to interact with specific 
ligands can be identified using proteins of the invention in screening assays. 
Proteins and antibodies of the invention can also be used to design diagnostic tests 
and therapeutic compositions for diseases which may be associated with altered 
expression of these proteins. Fusion proteins comprising, for example, signal 
sequences or transmembrane domains of the disclosed proteins, can be used to 
target other protein domains to cellular locations in which the domains are not 
normally found, such as bound to a cellular membrane or secreted extracellularly. 

Further objects, features, and advantages of the present invention will 
readily occur to the skilled artisan provided with the disclosure above. 

SYNOPSIS OF THE INVENTION 

1. An isolated and purified human protein having an amino acid 
sequence selected from the group consisting of the amino acid sequences shown in 
SEQ ID Nos:20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 
and 38. 

2. An isolated and purified human protein having an amino acid 
sequence which is at least 85% identical to an amino acid sequence selected from 
the group consisting of the amino acid sequences shown in SEQ ID Nos:20, 21, 22, 
23, 24, 25, 26, 27, 28, 29, 30, 3 1, 32, 33, 34, 35, 36, 37, and 38. 
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3. The isolated and purified human protein of item 2 wherein the amino 
acid sequence is at least 90% identical. 

4. The isolated and purified human protein of item 2 wherein the amino 
acid sequence is at least 95% identical. 

5. The isolated and purified human protein of item 2 wherein the amino 
acid sequence is at least 98% identical. 

6. An isolated and purified human polypeptide comprising at least 6 
contiguous amino acids of an amino acid sequence selected from the group 
consisting of the amino acid sequences shown in SEQ ID Nos:20, 21, 22, 23, 24, 
25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, and 38. 

7. A fusion protein comprising a first protein segment and a second 
protein segment fused together by means of a peptide bond, wherein the first 
protein segment consists of at least 6 contiguous amino acids selected from the 
group consisting of the amino acid sequences shown in SEQ ID Nos:20, 21, 22, 23, 
24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, and 38. 

8 . A preparation of antibodies which specifically bind to the human 
protein of item L 

9. The preparation of antibodies of item 8 wherein the antibodies are 
monoclonal. 

10. The preparation of antibodies of item 8 wherein the antibodies are 
polyclonal. 

1 1 . The preparation of antibodies of item 8 wherein the antibodies are 
single chain antibodies. 

12. An isolated and purified subgenomic polynucleotide having a 
nucleotide sequence selected from the group consisting of the nucleotide sequences 
shown in SEQIDNOs:l, 2, 3,4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 
and 19. 

13. An isolated and purified subgenomic polynucleotide consisting of at 
least 10 contiguous nucleotides of a nucleotide sequence selected from the group 
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consisting of the nucleotide sequences shown in SEQ ED NOs:l, 2, 3, 4, 5, 6, 7, 8, 
9, 10, 11, 12, 13, 14, 15, 16, 17, 18, and 19. 

14. An isolated gene corresponding to a cDNA sequence selected from 
the group consisting of the nucleotide sequences shown in SEQ ID NOs:l, 2, 3, 4, 
5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, and 19, 

15. A DNA construct for expressing all or a portion of a human protein 
having an amino acid sequence selected from the group consisting of the amino acid 
sequences shown in SEQ ID Nos:20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 
33, 34, 35, 36, 37, and 38, comprising: 

a promoter; and 

a polynucleotide segment encoding at least 6 contiguous amino acids 
of the human protein, wherein the polynucleotide segment is located downstream 
from the promoter, wherein transcription of the polynucleotide segment initiates at 
or 3' to the promoter. 

16. A host cell comprising a DNA construct comprising: 
a promoter; and 

a polynucleotide segment encoding at least 6 contiguous amino acids 
of a human protein having an amino acid sequence selected from the group 
consisting of the amino acid sequences shown in SEQ ID Nos:20, 21, 22, 23, 24, 
25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, and 38, wherein the 
polynucleotide segment is located downstream from the pormoter and wherein 
transcription of the polynucleotide segment initiates at or 3* to the promoter. 

17. A homologously recombinant cell having incorporated therein a new 
transcription initiation unit, wherein the new transcription initiation unit comprises 
in 5' to 3* order: 

(a) an exogenous regulatory sequence; 

(b) an exogenous exon; and 

(c) a splice donorr site, 

wherein the transcription initiation unit is located upstream to a coding sequence of 
a gene, wherein the gene comprises a nucleotide sequence selected from the group 
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consisting of the nucleotide sequences shown in SEQ ID NOs.l, 2, 3, 4, 5, 6, 7, 8, 
9, 10, 11, 12, 13, 14, 15, 16, 17, 18, and 19, and wherein the exogenous regulatory 
sequence controls transcription of the coding sequence of the gene. 

18. A method of producing a human protein, comprising the steps of: 
growing a culture of a cell comprising aDNA construct comprising 

(1) a promoter and (2) a polynucleotide segment encoding at least 6 contiguous 
amino acids of a human protein having an amino acid sequence selected from the 
group consisting of the amino acid sequences shown in SEQ ID Nos:20, 21, 22, 23, 
24, 25, 26, 27, 28, 29, 30, 3 1, 32, 33, 34, 35, 36, 37, and 38, wherein the 
polynucleotide segment is located downstream from the promoter and wherein 
transcription of the polynucleotide segment initiates at or 3' to the promoter; and; 
purifying the protein from the culture. 

19. A method of producing a human protein, comprising the steps of: 
growing a culture of a homologously recombinant cell having 

incorporated therein a new transcription initiation unit, wherein the new 
transcription initiation unit comprises in 5* to 3' order: 

(a) an exogenous regulatory sequence; 

(b) an exogenous exon; and 

(c) a splice donor site, 

wherein the transcription initiation unit is located upstream to a coding sequence of 
a gene, wherein the gene comprises a nucleotide sequence selected from the group 
consisting of the nucleotide sequences shown in SEQ ID NOs:l, 2, 3, 4, 5, 6, 7, 8, 
9, 10, 11, 12, 13, 14, 15, 16, 17, 18, and 19 and wherein the exogenous regulatory 
sequence controls transcription of the coding sequence of the gene; and 
purifying the protein from the culture. 

20. A method of identifying a secreted polypeptide which is modified by 
rough microsomes, comprising the steps of: 

transcribing in vitro a population of cDNA molecules whereby a 
population of cRNA molecules is formed; 
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translating a first portion of the population of cRNA molecules in 
vitro in the absence of rough microsomes whereby a first population of polypeptides 
is formed; 

translating a second portion of the population of cRNA molecules in 
vitro in the presence of rough microsomes whereby a second population of 
polypeptides is formed; 

comparing the first population of polypeptides with the second 
population of polypeptides; and 

detecting polypeptide members of the second population which have 
been modified by the rough microsomes. 

2 1 . The method of item 20 wherein the population of cDNA molecules 
is synthesized by reverse transcription of a population of mRNA molecules. 

22. The method of item 2 1 wherein the mRNA molecules are isolated 
from a mammal. 

23 . The method of item 22 wherein the mRNA molecules are isolated 
from a human. 

24. The method of item 20 wherein the population of cDNA molecules 
is obtained from a cDNA library. 

25. The method of item 24 wherein the cDNA library is derived from a 
mammalian genome. 

26. The method of item 25 wherein the cDNA library is derived from a 
human genome. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION 
(i) APPLICANT: Chiron Corporation 

(ii) TITLE OF THE INVENTION: Secreted Human Proteins 

(iii) NUMBER OF SEQUENCES: 38 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Banner & Witcoff 

(B) STREET: 1001 G Street, NW 

(C) CITY: Washington 

(D) STATE: DC 

(E) COUNTRY: USA 

(F) ZIP: 20001 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Diskette 

(B) COMPUTER: IBM Compatible 

(C) OPERATING SYSTEM: DOS 

(D) SOFTWARE: FastSEQ for Windows Version 2.0 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: ll-DEC-1997 
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(C) CLASSIFICATIONS 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 60/032757 

(B) FILING DATE: ll-DEC-1996 



(viii) ATTORNEY /AGENT INFORMATION: 

(A) NAME: Kagan, Sarah A 

(B) REGISTRATION NUMBER: 32141 

(C) REFERENCE /DOCKET NUMBER ! 
2441. 39505; 1369. 002; 1452 ,001 

(ix) TELECOMMUNICATION INFORMATION : 

(A) TELEPHONE: 202-508-9100 

(B) TELEFAX: 202-508-9299 

(C) TELEX: 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2063 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

GAATTCGGCA CGAGGCCTCA GTCTTCCAGG GCGGCGGTGG GTGTCCGCTT CTCTCTGCTC 60 

TTCGACTGCA CCGCACTCGC GCGTGACCCT GACTCCCCCT AGTCAGCTCA GCGGTGCTGC 120 

CATGGCGTGG CGGCGGCGCG AAGCCGGCGT CGGGGCTCGC GGCGTGTTGG CTCTGGCGTT 180 

GCTCGCCCTG GCCCTGTGCG TGCCCGGGGC CCGGGGCCGG GCTCTCGAGT GGTTCTCGGC 240 
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CGTGGTAAAC 


ATCGAGTACG 


TGGACCCGCA 


GACCAACCTG 


ACGGTGTGGA 


GCGTCTCGGA 


300 


GAGTGGCCGC 


TTCGGCGACA 


GCTCGCCCAA 


GGAGGGCGCG 


CATGGCCTGG 


TGGGCGTCCC 


360 


GTGGGCGCCC 


GGCGGAGACC 


TCGAGGGCTG 


CGCGCCCGAC 


ACGCGCTTCT 


TCGTGCCCGA 


420 


GCCCGGCGGC 


CGAGGGGCCG 


CGCCCrGGGT 


CGCCCTGGTG 


GCTCGTGGGG 


GCTGCACCTT 


480 


CAAGGACAAG 


GTGCTGGTGG 


CGGCGCGGAG 


GAACGCCTCG 


GCCGTCGTCC 


TCTACAATGA 


540 


GGAGCGCTAC 


GGGAACATCA 


CCTTGCCCAT 


GTCTCACGCG 


GGAACAGGAA 


ATATAGTGGT 


600 


CATTATGATT 


AGCTATCCAA 


AAGGAAGAGA 


AATTTTGGAG 


CTGGTGCAAA 


AAGGAATTCC 


660 


AGTAACGATG 


ACCATAGGGG 


TTGGCACCCG 


GCATGTACAG 


GAGTTCATCA 


GCGGTCAGTC 


720 


TGTGGTGTTT 


GTGGCCATTG 


CCTTCATCAC 


CATGATGATT 


ATCTCGTTAG 


CCTGGCTAAT 


780 


ATTTTACTAT 


ATACAGCGTT 


T CCT AT AT AC 


TGGCTCTCAG 


ATTGGAAGTC 


AGAGCCATAG 


840 


AAAAGAAACT 


AAGAAAGTTA 


TTGGCCAGCT 


TCTACTTCAT 


ACTGTAAAGC 


ATGGAGAAAA 


900 


GGGAATTGAT 


GTTGATGCTG 


AAAATTGTGC AGTGTGTATT 


GAAAATTTCA 


AAGTAAAGGA 


960 


TATTATTAGA 


ATTCTGCCAT 


GCAAGCATAT 


TTTTCATAGA 


ATATGCATTG 


ACCCATGGCT 


1020 


TTTGGATCAC 


CGAACATGTC 


CAATGTGTAA 


ACTTGATGTC 


ATCAAAGCCC 


TAGGATATTG 


1080 


GGGAGAGCCT 


GGGGATGTAC 


AGGAGATGCC 


TGCTCCAGAA 


TCTCCTCCTG 


GAAGGGATCC 


1140 


AGCTGCAAAT 


TTGAGTCTAG 


CTTTACCAGA 


TGATGACGGA 


AGTGATGACA 


GCAGTCCACC 


1200 


ATCAGCCTCC 


CCTGCTGAAT 


CTGAGCCACA 


GTGTGATCCC 


AGCTTTAAAG 


GAGATGCAGG 


1260 


AGAAAATACG 


GCATTGCTAG 


AAGCCGGCAG 


GAGTGACTCT 


CGGCATGGAG 


GACCCATCTC 


1320 


CTAGCACACG 


TGCCCACTGA AGTGGCACCA ACAGAAGTTT 


GGCTTGAACT 


AAAGGACATT 


1380 


TTATTTTTTT 


TACTTTAGCA 


CATAATTTGT 


ATATTTGAAA 


ATAATGTATA 


TTATTTTACC 


1440 


TATTAGATTC 


TGATTTGATA 


TACAAAGGAC 


TAAGATATTT 


TCTTCTTGAA 


GAGACTTTTC 


1500 


GATTAGTCCT 


CATATATTTA 


TCTACTAAAA 


TAGAGTGTTT 


ACCATGAACA 


GTGTGTTGCT 


1560 


TCAGACTATT 


ACAAAGACAA CTGGGGCAGG 


TACTCTAATA 


TAAAGGACAG 


GTGGTGTTTC 


1620 


TAAATAATTG 


GCTGCTATGG 


TTCTGTAAAA 


ACCAGTTAAT 


TCTATTTTTC 


AAGGTTTTTG 


1680 


GCAAAGCACA 


TCAATGTTAG ACTAGTTGAA GTGGAATTGT 


ATAATTCAAT 


TCGATAATTG 


1740 


ATCTCATGGG 


CTTTCCCTGG 


AGGAAAGGTT 


TTTTTTGTTG 




AAGAACTTGA 


1800 


AACTTGTAAA 


CTGAGATGTC 


TGTAGCTTTT 


TTGCCCATCT 


GTAGTGTATG 


TGAAGATTTC 


1860 


AAAACCTGAG 


AGCACTTTTT 


CTTTGTTTAG 


AATTATGAGA 


AAGGCACTAG 


ATGACTTTAG 


1920 


GATTTGCATT 


TTTCCCTTTA 


TTGCCTCATT 


TCTTGTGACG 


CCTTGTTGGG 


GAGGGAAATC 


1980 


TGTTTATTTT 


TTCCTACAAA 


TAAAAAGCTA 


AGATTCTATA 


TCGCAAAAAA 


AAAAAAAAAA 


2040 


AAAAAAAAAA 


TTCCTGCGGC 


CGC 








2063 



(2) INFORMATION FOR SEQ'ltf NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 1328 base pairs 
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(B) TYPE 2 nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY; linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2s 



GAATTCGGCA 


CGAGGTAGGC 


AAGGGATAAA AAGGCACCTA 


AGGCCCTTTT 


GCAATAAGAA 


60 


GCCAGATGGA 


TAAAGGAAGT 


GCTGGTCACC 


CTGGAGGTGT 


ACTGGTTTGG 


GGAAGGTCCC 


120 


CGGCCCCCAC 


AGCCCTCTGG 


GGAGCCTCAC 


CCTGGCTCTC 


CCCACTCACC 


TCAGCCCTCA 


180 


GGCAGCCCCT 


CCACAGGGCC 


CCTCTCCTGC 


CTGGACAGCT 


CTGCTGGTCT 


CCCCGTCCCC 


240 


TGGAGAAGAA 


CAAGGCCATG 


GGTCGGCCCC 


TGCTGCTGCC 


CCTGCTGCTC 


CTGCTGCAGC 


300 


CGCCAGCATT 


TCTGCAGCCT 


GGTGGCTCCA 


CAGGATCTGG 


TCCAAGCTAC 


CTTTATGGGG 


360 


TCACTCAACC 


AAAACACCTC 


TCAGCCTCCA 


TGGGTGGCTC 


TGTGGAAATC 


CCCTTCTCCT 


420 


TCTATTACCC 


CTGGGAGTTA 


GCCATAGTTC 


CCAACGTGAG AATATCCTGG AGACGGGGCC 


480 


ACTTCCACGG 


GCAGTCCTTC 


TACAGCACAA 


GGCCGCCTTC 


CATTCACAAG 


GATTATGTGA 


540 


ACCGGCTCTT 


TCTGAACTGG 


ACAGAGGGTC 


AGGAGAGCGG 


CTTCCTCAGG 


ATCTCAAACC 


600 


TGCGGAAGGA 


GGACCAGTCT 


GTGTATTTCT 


GCCGAGTCGA 


GCTGGACACC 


CGGAGATCAG 


660 


GGAGGCAGCA 


GTTGCAGTCC 


ATCAAGGGGA 


CCAAACTCAC 


CATCACCCAG 


GCTGTCACAA 


720 


CCACCACCAC 


CTGGAGGCCC 


AGCAGCACAA 


CCACCATAGC 


CGGCCTCAGG 


GTCACAGAAA 


780 


GCAAAGGGCA 


CTCAGAATCA 


TGGCACCTAA GTCTGGACAC 


TGCCATCAGG 


GTTGCATTGG 


840 


CTGTCGCTGT 


GCTCAAAACT 


GTCATTTTGG 


GACTGCTGTG 


CCTCCTCCTC 


CTGTGGTGGA 


900 


GGAGAAGGAA 


AGGTAGCAGG 


GCGCCAAGCA 


GTGACTTCTG 


ACCAACAGAG 


TGTGGGGAGA 


960 


AGGGATGTGT 


ATTAGCCCCG 


GAGGACGTGA 


TGTGAGACCC 


GCTTGTGAGT 


CCTCCACACT 


1020 


CGTTCCCCAT 


TGGCAAGATA 


CATGGAGAGC 


ACCCTGAGGA 


CCTTTAAAAG 


GCAAAGCCGC 


1080 


AAGGCAGAAG 


GAGGCTGGGT 


CCCTGAATCA 


CCGACTGGAG 


GAGAGTTACC 


TACAAGAGCC 


1140 


TTCATCCAGG AGCATCCACA CTGCAATGAT ATAGGAATGA 


GGTCTGAACT 


CCACTGAATT 


1200 


AAACCACTGG 


CATTTGGGGG 


CTGTTTATTA 


TAGCAGTGCA 


AAGAGTTCCT 


TTATCCTCCC 


1260 


CAAGGATGGA 


AAAATACAAT 


TTATTTTGCT 


TACCATAAAA 


AAAAAAAAAA 


AAAAATTCCT 


1320 


GCGGCCGC 












1328 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1689 base pairs 

(B) TYPE: nucleic acid 



29 



WO 98/25959 



PCTYUS97/22787 



(C) STRANDEDNESS : single 

(D) TOPOLOGY; linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NOx 3s 



GAATTCGGCA 
TCACTTTCTT 
AGCACCAGTG 
ACATGACTGT 
TGAAGATTGC 
ACTCAGCTCA 
CTCATATGTC 
TTGGGGGCAG 
AGGGGCTGCA 
CAGTCAAGGA 
AG CTGCGGCA 
GGCAGATCCA 
ATGCCTGCAG 
ATTTATTACC 
TAAATCCAGT 
GAACGCTCGA 
"TTGGCAAAGT 
ATGACCTCAC 
GCGTCCTCTT 
AATGTTATTT 
GTTGACATTT 
CTAGGATTTC 
ACTCTGCTCA 
ATATATCACT 
TGGTTTAAAT 
TAAATTCGTG 
TTTGTGGTGT 
GCCCATTATG 
TGCGGCCGC 



CGAGGGCAAG 
CATTCACAAT 
TTCCCTGGGG 
GAGCCAGCGC 
CCTGCGGGTG 
AGTCAAACGT 
TGGGTCTCCA 
TGATAAGCCT 
CGACCTGGGC 
GCCGACCCCC 
AAGGCTGAGG 
GCTGACCATC 
AAACCTCATT 
AGACAAGAGG 
GTTTGATCAA 
CGTTGCCGTG 
ATTGGTTGCT 
GGAAGATGGG 
CAGCGTAGCT 
TTATAATTTC 
CAGGCAAATT 
GCCAGTTCCT 
GCTGTGTCCG 
GAGGTATACT 
TCAGAAGGAA 
TGACAAATAA 
TTCTTTTTGA 
AAAGATGAAA 



ATTCGATACA AAACCAATGA 
CCCAAGCGCC AGGACCTTGA 
AACCTGAAGG TCCCCCTCAG 
TTCCAGCTCA GTAACTCGGG 
CTCCATCTCG AAAAGCGAGA 
CCCTCTGTGT CCAAAGAGGG 
GGCCCTGGTG GCAGCAACAC 
GGTATGGAAG AAAAGGCCCA 
AGAAGCTCCT CCAGCCTCCT 
AGCATCGCCT CGGACATCTC 
CAGCTGGAAA ACGGGACGAC 
CGGCACAGCT CGCAGAGAAA 
GCCTTCTCTG AAGACGGCTC 
CGGTCAGGAA GGAGGAAAAC 
AGCTTTGATT TCAGTGTTTC 
AAGAACAGTG GCGGCTTCCT 
CTGGCATCTG AAGAACTTGC 
ACGAGGCCTC AGGCGATGAC 
CTCCACCTCT ACCCGGAACA 
ATGGATTTAG TTATACATAC 
TGGCCAATAT TATCATTGAA 
ACAACGTGCA GTAGGGCGGC 
TAGGAGTCGG ATGTGTCTGT 
ATGCCATGTA AATAGACTAT 
ATAGATCAAG GAAATATATA 
TCATTTTCAT CTTGGCAGCA 
AAAGAAAAGC , TGAAATATTA 
TAAAGTATTC AAAATATTAA 



ACCTGTGTGG 
AGTTGAGGTC 
CCAGCTGCTC 
TCCAAACAGC 
AAGGCCTCCA 
GAGGAAAACA 
AGCTCCATCC 
GCCCCCTGAG 
GGCCTCCCCA 
GCTGCCCATC 
CCTGGGACAG 
CAAGCTTATC 
TGACCCCTAT 
ACACGTGTCA 
GTTACCAGAA 
GTCCAAAGAC 
CAAAGGCTGG 
ATAGCCGCAG 
CACCCTCTCA 
CTTAATAGTT 
TTTTCTGTGT 
GGTAGCTCTT 
GCTTTATTAT 
TTTTTATAAT 
TATTTTCTTC 
AAAAGTTCTC 
TTAAATGCTA 
AAAAAAAAAA 



GAGGAAAACT 
AGAGACGAGC 
ACCAGTGAGG 
ACCATCAAGA 
GACCACCAAC 
TCCATCAAAT 
ACACCAGTCA 
GCCGGCCCTC 
GGCCACATCT 
GCCACCCAGG 
TCTCCACTGG 
GTGGTCGTGC 
GTCCGCATGT 
AAGAAAACAT 
GTGCAGAGGA 
AAAGGGCTCC 
ACCCAGTGGT 
CAGGCAGGAG 
CAGACGTACC 
TTATAAAATT 
TGGATTTCCT 
GTGTCTGTGG 
GGCCTTGTTT 
CTTAACATGC 
TAAAACTTAT 
AGTGACCTAT 
GTATGTTTCT 
AAAAAATTCC 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1689 
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(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1505 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

GAATTCGGCA CGAGGAGCAG ATCTGCAAGA GTTTCGTTTA TGGAGGCTGC TTGGGCAACA 60 

AGAACAACTA CCTTCGGGAA GAAGAGTGCA TTCTAGCCTG TCGGGGTGTG CAAGGTGGGC 120 

CTTTGAGAGG CAGCTCTGGG GCTCAGGCGA CTTTCCCCCA GGGCCCCTCC ATGGAAAGGC 180 

GCCATCCAGT GTGCTCTGGC ACCTGTCAGC CCACCCAGTT CCGCTGCAGC AATGGCTGCT 240 

GCATCGACAG TTTCCTGGAG TGTGACGACA CCCCCAACTG CCCCGACGCC TCCGACGAGG 300 

CTGCCTGTGA AAAATACACG AGTGGCTTTG ACGAGCTCCA GCGCATCCAT TTCCCCAGCG 360 

ACAAAGGGCA CTGCGTGGAC CTGCCAGACA CAGGACTCTG CAAGGAGAGC ATCCCGCGCT 420 

GGTACTACAA CCCCTTCAGC GAACACTGCG CCCGCTTTAC CTATGGTGGT TGTTACGGCA 480 

ACAAGAACAA CTTTGAGGAA GAGCAGCAGT GCCTCGAGTC TTGTCGCGGC ATCTCCAAGA 540 

AGGATGTGTT TGGCCTGAGG CGGGAAATCC CCATTCCCAG CACAGGCTCT GTGGAGATGG 600 

CTGTCGCAGT GTTCCTGGTC ATCTGCATTG TGGTGGTGGT AGCCATCTTG GGTTACTGCT 660 

TCTTCAAGAA CCAGAGAAAG GACTTCCACG GACACCACCA CCACCCACCA CCCACCCCTG 720 

CCAGCTCCAC TGTCTCCACT ACCGAGGACA CGGAGCACCT GGTCTATAAC CACACCACGC 780 

GGCCCCTCTG AGCCTGGGTC TCACCGGCTC TCACCTGGCC CTGCTTCCTG CTTGCCAAGG 840 

CAGAGGCCTG GGCTGGGAAA AACTTTGGAA CCAGACTCTT GCCTGTTTCC CAGGCCCACT 900 

GTGCCTCAGA GACCAGGGCT CCAGCCCCTC TTGGAGAAGT CTCAGCTAAG CTCACGTCCT 960 

GAGAAAGCTC AAAGGTTTGG AAGGAGCAGA AAACCCTTGG GCCAGAAGTA CCAGACTAGA 1020 

TGGACCTGCC TGCATAGGAG TTTGGAGGAA GTTGGAGTTT TGTTTCCTCT GTTCAAAGCT 1080 

GCCTGTCCCT ACCCCATGGT GCTAGGAAGA GGAGTGGGGT GGTGTCAGAC CCTGGAGGCC 1140 

CCAACCCTGT CCTCCCGAGC TCCTCTTCCA TGCTGTGCGC CCAGGGCTGG GAGGAAGGAC 1200 

TTCCCTGTGT AGTTTGTGCT GTAAAGAGTT GCTTTTTGTT TATTTAATGC TGTGGCATGG 1260 

GTGAAGAGGA GGGGAAGAGG CCTGTTTGGC ^CTCTCTATCC TCTCTTCCTC TTCCCCCAAG 1320 

ATTGAGCTCT CTGCCCTTGA TCAGCCCCAC CCTGGCCTAG ACCAGCAGAC AGAGCCAGGA 1380 

GAAGCTCAGC TGCATTCCGC AGCCCCCACC CCCAAGGTTC TCCAACATCA CAGCCCAGCC 1440 

CGCCCACTGG GTAATAAAAG TGGTTTGTGG AAAAAAAAAA AAAAAAAAAA AAGTCCTGCG 1500 
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(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2002 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

GAATTCGGCA CGAGGGCCAT GGCCGGGCTA TCCCGCGGGT CCGCGCGCGC ACTGCTCGCC 60 

GCCCTGCTGG CGTCGACGCT GTTGGCGCTG CTCGTGTCGC CCGCGCGGGG TCGCGGCGGC 120 

CGGGACCACG GGGACTGGGA CGAGGCCTCC CGGCTGCCGC CGCTACCACC CCGCGAGGAC 180 

GCGGCGCGCG TGGCCCGCTT CGTGACGCAC GTCTCCGACT GGGGCGCTCT GGCCACCATC 240 

TCCACGCTGG AGGCGGTGCG CGGCCGGCCC TTCGCCGACG TCCTCTCGCT CAGCGACGGG 300 

CCCCCGGGCG CGGGCAGCGG CGTGCCCTAT TTCTACCTGA GCCCGCTGCA GCTCTCCGTG 360 

AGCAACCTGC AGGAGAATCC ATATGCTACA CTGACCATGA CTTTGGCACA GACCAACTTC 420 

TGCAAGAAAC ATGGATTTGA TCCACAAAGT CCCCTTTGTG TTCACATAAT GCTGTCAGGA 480 

ACTGTGACCA AGGTGAATGA AACAGAAATG GATATTGCAA AGCATTCGTT ATTCATTCGA 540 

CACCCTGAGA TGAAAACCTG GCCTTCCAGC CATAATTGGT TCTTTGCTAA GTTGAATATA 600 

ACCAATATCT GGGTCCTGGA CTACTTTGGT GGACCAAAAA TCGTGACACC AGAAGAATAT 660 

TATAATGTCA CAGTTCAGTG AAGCAGACTG TGGTGAATTT AGCAACACTT ATGAAGTTTC 720 

TTAAAGTGGC TCATACACAC TTAAAAGGCT TAATGTTTCT CTGGAAAGCG TCCCAGAATA 780 

TTAGCCAGTT TTCTGTCACA TGCTGGTTTG TTTGCTTGCT TGTTTACTTG CTTGTTTACC 840 

AATAGAGTTG ACCTGTTATT GGATTTCCTG GAAGATGTGG TAGCTACTTT TTTCCTATTT 900 

TGAAGCCATT TTCGTAGAGA AATATCCTTC ACTATAATCA AATAAGTTTT GTCCCATCAA 960 

TTCCAAAGAT GTTTCCAGTG GTGCTCTTGA AGAGGAATGA GTACCAGTTT TAAATTGCCC 102 0 

ATTGGCATTT GAAGGTAGTT GAGTATGTGT TCTTTATTCC TAGAAGCCAC TGTGCTTGGT 1080 

AGAGTGCATC ACTCACCACA GCTGCCTCTT GAGCTGCCTG AGCCTGGTGC AAAAGGATTG 1140 

GCCCCCATTA TGGTGCTTCT GAAT AAATCT , TGCCAAGATA GACAAACAAT GATGAAACTC 1200 

AGATGGAGCT TCCTACTCAT GTTGATTTAT GTCTCACAAT CCTGGGTATT GTTAATTCAA 1260 

CATAGGGTGA AACTATTTCT GATAAAGAAC TTTTGAAAAA CTTTTTATAC TCTAAAGTGA 1320 

TACTCAGAAC AAAAGAAAGT CATAAAACTC CTGAATTTAA TTTCCCCACC TAAGTCGAGA 1380 

32 



WO 98/25959 



PCT/US97/22787 



CAGTATTATC AAAACACATG TG CAC AC AG A TTATTTTTTG GCTCCAAAAC TGGATTGCAA 1440 

AAGAAAGAGG AGAGATATTT TGTGTGTTCC TGGTATTCTT TTATAAGTAA AGTTACCCAG 1500 

GCATGGACCA GCTTCAGCCA GGGACAAAAT CCCCTCCCAA ACCACTCTCC ACAGCTTTTT 1560 

AAAAATACTT CTACTCTTAA CAATTACCTA AGGTTCCTTC AAACCCCCCC AACTCTTAAT 1620 

AGCTTCTAGT GCTGCTACAA TCTAAGTCAG GTCACCAGAG GGAAGAGAAC ATGGCATTAA 1680 

AAGAATCACA TCTTCAGAAG AGAAGACACT AATATTATTA CCCATATACA TGATTTCAGA 1740 

AGATGACATA AGATTCCTCT TAAAGAGGAA ATGTCAGGAA TCAAGCCACT GAATCCTTAA 1800 

AGAGAAAAGT TGAATATGAG TCATTGTGTC TGAAAACTGC AAAGTGAACT TAACTGAGAT 1860 

CCAGCAAACA GGTTCTGTTT AAGAAAAATA ATTTATACTA AATTTAGTAA AATGGACTTC 1920 

TTATTCAAAG CATCAATAAT TAAAAGAATT ATTTTAAAAA AAAAAAAAAA AAAAAAAAAA 1980 

AAAAAAAAAT TCCTGCGGCC GC 2 002 

(2) INFORMATION FOR SEQ ID NO* 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1322 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 



GAATTCGGCA 


CGAGGGCCAC 


GACTCTGCTG 


GCATTTCTTC 


TATAGCCACT GGAATCTGAT 


60 


CCTGATTGTC 


TTCCACTACT 


ACCAGGCCAT 


CACCACTCCG 


CCTGGGTACC CACCCCAGGG 


120 


CAGGAATGAT 


ATCGCCACCG 


TCTCCATCTG 


TAAGAAGTGC 


ATTTACCCCA AGCCAGCCCG 


180 


AACACACCAC 


TGCAGCATCT 


GCAACAGGTG 


TGTGCTGAAG 


ATGGATCACC ACTGCCCCTG 


240 


GCTAAACAAT 


TGTGTGGGCC 


ACTATAACCA 


TCGGTACTTC 


TTCTCTTTCT GCTTTTTCAT 


300 


GACTCTGGGC 


TGTGTCTACT 


GCAGCTATGG 


AAGTTGGGAC 


CTTTTCCGGG AGGCTTATGC 


360 


TGCCATTGAG 


AAAATGAAAC 


AGCTCGACAA 


GAACAAACTA 


CAGGCGGTTG CCAACCAGAC 


420 


TTATCACCAG 


ACCCCACCAC 


CCACCTTCTC 


CTTTCGAGAA 


AGGATGACTC ACAAGAGTCT 


480 


TGTCTACCTC 


TGGTTCCTGT 


GCAGTTCTGT 


GGCACTTGCC 


CTGGGTGCCC TAACTGTATG 


540 


GCATGCTGTT 


CTCATCAGTC 


GAGGTGAGAC j TAGCATCGAA 


AGGCACATCA ACAAGAAGGA 


600 


GAGACGTCGG 


CTACAGGCCA 


AGGGCAGAGT 


ATTTAGGAAT 


CCTTACAACT ACGGCTGCTT 


660 


GGACAACTGG 


AAGGTATTCC 


TGGGTGTGGA 


TACAGGAAGG 


CACTGGCTTA CTCGGGTGCT 


720 


CTTACCTTCT 


ACTCACTTGC 


CCCATGGGAA 


TGGAATGAGC 


TGGGAGCCCC CTCCCTGGGT 


780 
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GACTGCTCAC TCAGCCTCTG TGATGGCAGT GTGAGCTGGA CTGTGTCAGC CACGACTCGA 840 

GCACTCATTC TGCTCCCTAT GTTATTTCAA GGGCCTCCAA GGGCAGCTTT TCTCAGAATC 900 

CTTGATCAAA AAGAGCCAGT GGGCCTGCCT TAGGGTACCA TGCAGGACAA TTCAAGGACC 960 

AGCCTTTTTA CCACTGCAGA AGAAAGACAC AATGTGGAGA AATCTTAGGA CTGACATCCC 1020 

TTTACTCAGG CAAACAGAAG TTCCAACCCC AGACTAGGGG TCAGGCAGCT AGCTACCTAC 1080 

CTTGCCCAGT GCTGACCCGG ACCTCCTCCA GGATACAGCA CTGGAGTTGG CCACCACCTC 1140 

TTCTACTTGC TGTCTGAAAA AACACCTGAC TAGTACAGCT GAGATCTTGG CTTCTCAACA 1200 

GGGCAAAGAT ACCAGGCCTG CTGCTGAGGT CACTGCCACT TCTCACATGC TGCTTAAGGG 1260 

AGCACAAATA AAGGTATTCG ATTTTTAAAA AAAAAAAAAA AAAAAAAAAT TCCTGCGGCC 1320 
GC 



<2) INFORMATION FOR SEQ ID NO: 7 5 

(i) SEQUENCE CHARACTERISTICS s 

(A) LENGTH: 1573 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



1322 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 



GAATTCGGCA 


CGAGGAGCCT 


GCCTTCATCT 


AGGATGGCTC 


CTCTGGGCAT 


GCTGCTTGGG 


60 


CTGCTGATGG 


CCGCCTGCTT 


CACCTTCTGC 


CTCAGTCATC 


AGAACCTGAA GGAGTTTGCC 


120 


CTGACCAACC 


CAGAGAAGAG 


CAGCACCAAA 


GAAACAGAGA 


GAAAAGAAAC 


CAAAGCCGAG 


180 


GAGGAGCTGG 


ATGCCGAAGT 


CCTGGAGGTG 


TTCCACCCGA 


CGCATGAGTG GCAGGCCCTT 


240 


CAGCCAGGGC 


AGGCTGTCCC 


TGCAGGATCC 


CACGTACGGC 


TGAATCTTCA GACTGGGGAA 


300 


AGAGAGGCAA 


AACTCCAATA 


TGAGGACAAG 


TTCCGAAATA 


ATTTGAAAGG 


CAAAAGGCTG 


360 


GATATCAACA 


CCAACACCTA 


CACATCTCAG 


GATCTCAAGA 


GTGCACTGGC 


AAAATTCAAG 


420 


GAGGGGGCAG 


AGATGGAGAG 


TTCAAAGGAA 


GACAAGGCAA 


GGCAGGCTGA 


GGTAAAGCGG 


480 


CTCTTCCGCC 


CCATTGAGGA 


ACTGAAGAAA 


GACTTTGATG 


AGCTGAATGT 


TGTCATTGAG 


540 


ACTGACATGC 


AGATCATGGT 


ACGGCTGATC 


AACAAGTTCA 


ATAGTTCCAG 


CTCCAGTTTG 


600 


GAAGAGAAGA 


TTGCTGCGCT 


CTTTGATCTT "pAATATTATG 


TCCATCAGAT 


GGACAATGCG 


660 


CAGGACCTGC 


TTTCCTTTGG 


TGGTCTTCAA 


GTGGTGATCA 


ATGGGCTGAA 


CAGCACAGAG 


720 


CCCCTCGTGA 


AGGAGTATGC 


TGCGTTTGTG 


CTGGGCGCTG 


CCTTTTCCAG 


CAACCCCAAG 


780 


GTCCAGGTGG 


AGGCCATCGA 


AGGGGGAGCC 


CTGCAGAAGC 


TGCTGGTCAT 


CCTGGCCACG 


840 
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GAGCAGCCGC TCACTGCAAA GAAGAAGGTC 


CTGTTTGCAC 


TGTGCTCCCT 


GCTGCGCCAC 


900 


TTCCCCTATG CCCAGCGGCA GTTCCTGAAG 


CTCGGGGGGC 


TGCAGGTCCT 


GAGGACCCTG 


960 


GTGCAGGAGA AGGGCACGGA GGTGCTCGCC 


GTGCGCGTGG 


TCACACTGCT 


CTACGACCTG 


1020 


GTCACGGAGA AGATGTTCGC CGAGGAGGAG 


GCTGAGCTGA 


CCCAGGAGAT 


GTCCCCAGAG 


1080 


AAGCTGCAGC AGTATCGCCA GGTACACCTC 


CTGCCAGGCC 


TGTGGGAACA 


GGGCTGGTGC 


1140 




GAGCATGATG 


CCCGTGAGAA 


GGTGCTGCAG 


1200 


ACACTGGGCG TCCTCCTGAC CACCTGCCGG 


GACCGCTACC 


GTCAGGACCC 


CCAGCTCGGC 


1260 


AGGACACTGG CCAGCCTGCA GGCTGAGTAC 


CAGGTGCTGG 


CCAGCCTGGA 


GCTGCAGGAT 


1320 


GGTGAGGACG AGGGCTACTT CCAGGAGCTG 


CTGGGCTCTG 


TCAACAGCTT 


GCTGAAGGAG 


1380 


CTGAGATGAG GCCCCACACC AGGACTGGAC 


TGGGATGCCG 


CTAGTGAGGC 


TGAGGGGTGC 


1440 


CAGCGTGGGT GGGCTTCTCA GGCAGGAGGA 


CATCTTGGCA 


GTGCTGGCTT 


GGCCATTAAA 


1500 


TGGAAACCTG AAGGCCAAAA AAAAAAAAAA 


AAAAAAAAAA 


AAAAAAAAAA 


AAAAAAAAAA 


1560 


TTCCTGCGGC CGC 








1573 


(2) INFORMATION FOR SEQ 


ID NO: 8: 









(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1185 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) 


SEQUENCE DESCRIPTION: 


SEQ ID NO:8: 






GAATTCGGCA 


CGAGGGGGCT 


TTAAGGGACA 


GCTGAGCCGG CAGGTGGCAG 


ATCAGATGTG 


60 


GCAGGCTGGG 


AAAAGACAAG 


CCTCCAGGGC 


CTTCAGCTTG TACGCCAACA 


TCGACATCCT 


120 


CAGACCCTAC 


TTTGATGTGG 


AGCCTGCTCA 


GGTGCGAAGC AGGCTCCTGG AGTCCATGAT 


180 


CCCTATCAAG 


ATGGTCAACT 


TCCCCCAGAA 


AATTGCAGGT GAACTCTATG 


GACCTCTCAT 


240 


GCTGGTCTTC 


ACTCTGGTTG 


CTATCCTACT 


CCATGGGATG AAGACGTCTG 


ACACTATTAT 


300 


CCGGGAGGGC 


ACCCTGATGG 


GCACAGCCAT 


TGGCACCTGC TTCGGCTACT 


GGCTGGGAGT 


360 


CTCATCCTTC 


ATTTACTTCC 


TTGCCTACCT 


GTGCAACGCC CAGATCACCA 


TGCTGCAGAT 


420 


GTTGGCACTG 


CTGGGCTATG 


GCCTCTTTGG* 


GCATTGCATT GTCCTGTTCA TCACCTATAA 


480 


TATCCACCTC 


CACGCCCTCT 


TCTACCTCTT 


CTGGCTGTTG GTGGGTGGAC 


TGTCCACACT 


540 


GCGCATGGTA 


GCAGTGTTGG 


TGTCTCGGAC 


CGTGGGCCCC ACACAGCGGC 


TGCTCCTCTG 


600 


TGGCACCCTG 


GCTGCCCTAC 


ACATGCTCTT 


CCTGCTCTAT CTGCATTTTG 


CCTACCACAA 


660 
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(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1226 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



720 
780 



AGTGGTAGAG GGGATCCTGG ACACACTGGA GGGCCCCAAC ATCCCGCCCA TCCAGAGGGT 
CCCCAGAGAC ATCCCTGCCA TGCTCCCTGC TGCTCGGCTT CCCACCACCG TCCTCAACGC 

CACAGCCAAA GCTGTTGCGG TGACCCTGCA GTCACACTGA CCCCACCTGA AATTCTTGGC 840 

CAGTCCTCTT TCCCGCAGCT GCAGAGAGGA GGAAGACTAT TAAAGGACAG TCCTGATGAC 900 

ATGTTTCGTA GATGGGGTTT GCAGCTGCCA CTGAGCTGTA GCTGCGTAAG TACCTCCTTG 960 

ATGCCTGTCG GCACTTCTGA AAGGCACAAG GCCAAGAACT CCTGGCCAGG ACTGCAAGGC 1020 

TCTGCAGCCA ATGCAGAAAA TGGGTCAGCT CCTTTGAGAA CCCCTCCCCA CCTACCCCTT 1080 

CCTTCCTCTT TATCTCTCCC ACATTGTCTT GCTAAATATA GACTTGG TAA TTAAAATGTT 1140 
GATTGAAGTC TGGAAAAAAA AAAAAAAAAA AATTCCTGCG GCCGC 



1185 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

GAATTCGGCA CGAGGCAAGC CACCATCTTC CTTCGGCCTG CACCCCTTTA AAGGCACCCA 60 

GACCCCTCTG GAAAAAGATG AACTGAAGCC CTTTGACATC CTCCAGCCTA AGGAGTACTT 120 

CCAGCTCAGC CGCCACACGG TCATTAAGAT GGGAAGTGAG AACGAGGCCC TGGATCTCTC 180 

CATGAAGTCA GTGCCCTGGC TCAAGGCTGG TGAAGTCAGT CCCCCAATCT TCCAGGAAGA 240 

TGCAGCCCTA GACCTGTCAG TGGCAGCCCA CCGGAAATCC GAGCCTCCCC CTGAGACACT 300 

GTATGACAGT GGTGCATCAG TGGACAGCTC AGGTCACACA GTGATGGAGA AACTTCCCAG 360 

TGGCATGGAA ATTTCTTTTG CCCCTGCCAC GTCCCATGAG GCCCCAGCCA TGATGGATAG 420 

TCACATCAGC AGCAGTGATG CTGCTACCGA GATGCTCAGC CAGCCCAACC ACCCCAGCGG 480 

CGAAGTCAAG GCTGAAAATA ACATTGAGAT GGTGGGCGAG TCCCAGGCGG CCAAGGTCAT 540 

TGTCTCTGTC GAAGATGCTG TGCCTACCAT ATTCTGTGGC AAGATCAAAG GCCTCTCAGG 600 

GGTGTCCACC AAAAACTTCT CCTTCAAAAG AGAAGACTCC GTGCTTCAGG GCTATGACAT 660 

CAACAGCCAA GGGGAAGAGT CCATGGGAAa'.TGCAGAGCCC CTJAGGAAAC CCATCAAAAA 720 

CCGGAGCATA AAGTTAAAGA AAGTGAACTC CCAGGAAGTA CACATGCTCC CAATCAAAAA 780 

ACAACGGCTG GCCACCTTTT TTCCAAGAAA GTAAATAACG GCTTTTTAAA ATTTGTATGA 840 

TTATAATATG GGGAAAGGTG CATTGGTTTT ATAAAAAGGC ATTTAAAACA AATTATCTTT 900 
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GTTAATTATT TTGGGGAGTA GTTGGGAAAT GGAAAGGTGA ATTGGCTCTA GAGGCCCTGT 960 

ATGCTAGTAT CATTTTCTTT TTTAATTTTT GACTTTTCAC AAATGAGTAA ATAAGAGCAA 1020 

CCTATTTTTC AAGCAGATTG CACATTTTTT GCAGCTTTAA TGGAATATTG GGTGAATTAG 1080 

AGGGGTAAAA AAAGCTATTT TCATTGCCAC AAAGTGCTTT GATGATGTAA TACCTAATAA 1140 

AGGGTAGGAT GAATATTTCA CAATAAATGT TTGTTTGCAC TAAAAAAAAA AAAAAAAAAA 1200 

AAAAAAAAAA AAATTCCTGC GGCCGC 1226 

(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 1049 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10s 

GAATTCGGCA CGAGGGCGCC ATGGTGAAGG TGACGTTCAA CTCCGCTCTG GCCCAGAAGG 60 

AGGCCAAGAA GGACGAGCCC AAGAGCGGCG AGGAGGCGCT CATCATCCCC CCCGACGCCG 120 

TCGCGGTGGA CTGCAAGGAC CCAGATGATG TGGTACCAGT TGGCCAAAGA AGAGCCTGGT 180 

GTTGGTGCAT GTGCTTTGGA CTAGCATTTA TGCTTGCAGG TGTTATTCTA GGAGGAGCAT 240 

ACTTGTACAA ATATTTTGCA CTTCAACCAG ATGACGTGTA CTACTGTGGA ATAAAGTACA 300 

TCAAAGATGA TGTCATCTTA AATGAGCCCT CTGCAGATGC CCCAGCTGCT CTCTACCAGA 360 

CAATTGAAGA AAATATTAAA ATCTTTGAAG AAGAAGAAGT TGAATTTATC AGTGTGCCTG 420 

TCCCAGAGTT TGCAGATAGT GATCCTGCCA ACATTGTTCA TGACTTTAAC AAGAAACTTA 480 

CAGCCTATTT AGATCTTAAC CTGGATAAGT GCTATGTGAT CCCTCTGAAC ACTTCCATTG 540 

TTATGCCACC CAGAAACCTA CTGGAGTTAC TTATTAACAT CAAGGCTGGA ACCTATTTGC 600 

CTCAGTCCTA TCTGATTCAT GAGCACATGG TTATTACTGA TCGCATTGAA AACATTGATC 660 

ACCTGGGTTT CTTTATTTAT CGACTGTGTC ATGACAAGGA AACTTACAAA CTGCAACGCA 720 

GAGAAACTAT TAAAGGTATT CAGAAACGTG AAGCCAGCAA TTGTTTCGCA ATTCGGCATT 780 

TTGAAAACAA ATTTGCCGTG GAAACTTTAA TTTGTTCTTG AACAGTCAAG AAAAACATTA 840 

TTGAGGAAAA TTAATATCAC AGCATAACCC! CACCCTTTAC ATTTTGTTGC AGTTGATTAT 900 

TTTTTAAAGT CTTCTTTCAT GTAAGTAGCA AACAGGGCTT TACTATCTTT TCATCTCATT 960 

AATTCAATTA AAACCATTAC CTTAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 1020 

AAAAAAAAAA AAAAAATTCC TGCGGCCGC 1049 
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(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1142 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESSs single 

(D) TOPOLOGY: linear 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 



GAATTCGGCA 

CCAGTTGGGC 

TCCTCAAGGG 

GAGCTCCACT 

ACAAG ATAGA 

GGAAGGCTTC 

TTGACCCCCC 

TCATCACCAG 

AGGGCCAGAA 

CCTACAAGGA 

CTGGCGAGAA 

CAGGGAAGTA 

AGCCCAACCG 

ACGCGTGAGA 

CTGGCTAAGG 

CGCGAGGATG 

CTCAACGTCG 

CCGGGACCCT 

AGCTCCACTC 

GC 



CGAGGGGAGA 
CGACCAGGTG 
GATCCCTCTG 
GCCGCCTCCC 
TGAGGATGGC 
AAAGGCTGTC 
CGGACCCAAT 
CAAAGAGGAC 
GATCGTGTCC 
TACGCTGGGG 
GGAGAAGCTG 
TGTGCCGCCG 
CAGAGCCGAC 
CCGACCTGCA 
ACAAGACCAC 
CTGCGCGTGC 
AGTGGGCCAA 
TGGCGACAGA 
TCAAAAAAAA 



ATACTTTTTG 
GAGGAGGAGG 
GCCACAGGTG 
AAGGAGGTCA 
AAGAAGTTCA 
GCAAGGAGGA 
GTGGCCACCA 
CTGAACTGCC 
TGCCGCATCT 
CCCATGCAGA 
CCGGGAGAGC 
AGCCTGCGCG 
GACAACGCCA 
GGAGCTCTTC 
TGGCCAATCC 
CATTGCCGGG 
GCCGTCCACC 
AGACAGCCTC 
AAAAAAAAAA 



CGATGCCTAC 
GGGAGGACGA 
ACACCAGCCC 
TCAACGGAAA 
AGATTGTCCG 
AGAACTGGAA 
CCACTGTCAG 
AGGAGGAGGA 
GCAAGGGCGA 
AGGAGCTGGC 
TAGAGCCGGT 
ACGGGGCCAG 
CCATCCGTGT 
CGGCCTTTCG 
AAGGGCTTTG 
GTGTCCGGCT 
AACTAAGCCA 
CGAGAGCGCG 
AAAAAAAAAA 



TGGAGACTTT 
CAAATGTGTC 
AGAGCCAGAG 
CATAAAGACA 
CACCTTCAGG 
GAAGTTCGGG 
TGACGATGTC 
GGACCCTATG 
CCACTGGACC 
CGAGCAGCTG 
GCAGGCCACG 
CCGCCGCGGG 
CACCAACTTG 
GCTCCATCTC 
CCTTCATCAG 
TTGGCTACGA 
GCTGCCACTG 
GGCTCCAAGG 
AAAAAAAAAT 



GATTCGAAGC 
ACCAGCGAGC 
CTACTGCCGG 
GTGACAGAGT 
ATTGAGACCC 
AACTCAGAGT 
TCTATGACGT 
AACAAATTCA 
ACCCGCTGCC 
GGCCTGTCTA 
CAGAACAAGA 
GAGTCCATGC 
CGCAGAGGAC 
CCGCATCTAC 
CTTCCACCGC 
CCACCTCATC 
TGTACTCGGT 
GCAATAAAGC 
TCCTGCGGCC 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1142 



(2) INFORMATION FOR SEQ, ID NO: 12: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 1696 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESSj single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 



GAATTCGGCA 


CGAGGGAAAC 


ATGGCGGTAG 


GCTGGGACCA TAACACAAGC 


ATGACTATAT 


60 


GAAGGAAGAG 


GAAGGTTTTC 


CTGAAGATGA 


GGCGACTGAA TCGGAAAAAA 


ACTTTAAGTT 


120 


TGGTAAAAGA 


GTTGGATGCC 


TTTCCGAAGG 


TTCCTGAGAG CTATGTAGAG 


ACTTCAGCCA 


180 


GTGGAGGTAC 


AGTTTCTCTA 


ATAGCATTTA 


CAACTATGGC TTTATTAACC 


ATAATGGAAT 


240 


TCTCAGTATA 


TCAAGATACA 


TGGATGAAGT ATGAATACGA AGTAGACAAG 


GATTTTTCTA 


300 


GCAAATTAAG 


AATTAATATA 


GATATTACTG 


TTGCCATGAA GTGTCAATAT 


GTTGGAGCGG 


360 


ATGTATTGGA 


TTTAGCAGAA 


ACAATGGTTG 


CATCTGCAGA TGGTTTAGTT 


TATGAACCAA 


420 


CAGTATTTGA 


TCTTTCACCA 


CAGCAGAAAG 


AGTGGCAGAG GATGCTGCAG 


CTGATTCAGA 


480 


GTAGGCTACA 


AGAAGAGCAT 


TCACTTCAAG 


ATGTGATATT TAAAAGTGCT 


TTTAAAAGTA 


540 


CATCAACAGC 


TCTTCCACCA 


AGAGAAGATG 


ATTCATCACA GTCTCCAAAT 


GCATGCAGAA 


600 


TTCATGGCCA 


TCTATATGTC 


AATAAAGTAG 


CAGGGAATTT TCACATAACA 


GTGGGCAAGG 


660 


CAATTCCACA 


TCCTCGTGGT 


CATGCACATT 


TGGCAGCACT TGTCAACCAT 


GAATCTTACA 


720 


ATTTTTCTCA 


TAGAATAGAT 


CATTTGTCTT 


TTGGAGAGCT TGTTCCAGCA ATTATTAATC 


780 


CTTTAGATGG 


AACTGAAAAA 


ATTGCTATAG 


ATCACAACCA GATGTTCCAA 


TATTTTATTA 


840 


CAGTTGTGCC 


AACAAAACTA 


CATACATATA AAATATCAGC AGACACCCAT 


CAGTTTTCTG 


900 


TGACAGAAAG 


GGAACGTATC 


ATTAACCATG 


CTGCAGGCAG CCATGGAGTC 


TCTGGGATAT 


960 


TTATGAAATA 


TGATCTCAGT 


TCTCTTATGG 


TGACAGTTAC TGAGGAGCAC 


ATGCCATTCT 


1020 


GGCAGTTTTT 


TGTAAGACTC 


TGTGGTATTG 


TTGGAGGAAT CTTTTCAACA 


ACAGGCATGT 


1080 


TACATGGAAT 


TGGAAAATTT 


ATAGTTGAAA 


TAATTTGCTG TCGTTTCAGA 


CTTGGATCCT 


1140 


ATAAACCTGT 


CAATTCTGTT 


CCTTTTGAGG 


ATGGCCACAC AGACAACCAC 


TTACCTCTTT 


1200 


TAGAAAATAA 


TACACATTAA 


CACCTCCCGA 


TTGAAGGAGA AAAACTTTTT 


GCCTGAGACA 


1260 


TAAAACCTTT 


TTTTAATAAT 


AAAATATTGT 


GCAATATATT CAAAGAAAAG 


AAAACACAAA 


1320 


TAAGCAGAAA ACATACTTAT 


TTTAAAAAAG 


AAAAAAAAGG ATAAAAAAAC 


CCAAACTGAA 


1380 


ATTCTATATA 


CGTTGTGTCT 


GTTACAAATG 


TCGTAGAAGA AATCATGCAG 


CTAAACGATG 


1440 


AAGAAGCCCA 


ACTGGAGTGT 


TGCTTTGAAG 


ATGACGCCTT CTTATATTTT 


CATAGCAAAT 


1500 


GGGTGGTATC AAAATCAGAC ATTGCTTCTT » GCTGATAAAA AGCCTGAAGG 


AAATAAGTGA 


1560 


AACTACATCT 


ATGGGAAAAA 


AAAAAACATT 


GAGAAGTGCA AATGTTCGCA TCCTTTTGTT 


1620 


TTTAAAAGAT 


ATGATGTCAG 


AATAAAATGT GGAAAACATA CGGAAAAAAA AAAAAAAAAA 


1680 


AAATTCCTGC 


GGCCGC 








1696 
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(2) INFORMATION FOR SEQ ID NO: 13: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1100 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS t single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:13j 



GAATTCGGCA 

CAGCATTCCG 

GGTTTGCTGG 

ACCGGGAGAG 

CAAGTAAATG 

ACTAAGCAAT 

TTCTTCCTGT 

ACATTTAATA 

AACTCCAACT 

ACAGTGGTCA 

CTGGTGAATT 

TGCACGGTAC 

ATTTCATACA 

GGAGGAAATT 

AGAGAGCACA 

CAGAGGAGGA 

CACCTCCCTA 

CCCAGCAAAT 

AAAAAAATTC 



CGAGGCGGCA 

CTGCTGCTCG 

CTGAACGAGA 

ATAGCATCAC 

AGTTGGTGGC 

ATGTCCTCCT 

TTCCGCATTC 

AGCAAGACTC 

TCTACACGGT 

GTACATATGT 

TTACCGGGAA 

CTGAGATCCT 

TTGGCCTCAT 

CCACAGCTAT 

GCATATGTTC 

ATTGGTTCAC 

TGGTACCATT 

AGTTGGCACT 

CTGCGGCCGC 



CGAGGCGGCA 
CCCCTCCTCC 
GCAGGAAGAA 
CTGTCTCACG 
TTTGATCCCA 
GTCCATCCTG 
AGTCCTTGTG 
CCTTGTAATT 
GGCAGTGACC 
GACTACTAAC 
GGCCGAGATG 
GGTGCACAAC 
GACCCAGAGC 
TTAACAACTG 
CCAAGGCCTG 
TTAACTCCCA 
TATGTTTCTC 
CAATAAAGAT 



CGAGGGTGGC 

TGCAGGCGAA 

GCCATTGCTC 

TGCCAGGGGA 

CACAGTGATC 

CTTTGTCTCC 

GATGATGACG 

CTCACCATCA 

AGCCTGTCCA 

GTCTCCCTTA 

GGAGGACCGT 

ATAGTGATCT 

TCCTTGGAGA 

CTATTGGTTC 

AGTTCTGGAC 

GCAAACATCC 

AGAACCAGCA 

TTGCAGAATT 



ATATCACGGC 

AGCAAGAAGA 

AGTTCCCATA 

CAGGCTACAT 

AGAGATTGCG 

TGGCATCTGG 

GCATCAAAGT 

TGGCCACCCT 

GCCAGATTCA 

TTCCACCTCG 

TTTCCTATGT 

TCATGCGAAC 

CACATCACTA 

TTCCACACAG 

CTACCCCCAC 

TCCTGCCACT 

GAATCAGTGC 

TAAAAAAAAA 



CATGGGGTCT 

TGACAGGGAC 

TGTGGAATTC 

TCCAACAGAG 

CCCTCAGCGA 

TTTGGTGGTT 

GGTGAAAGTC 

GAAAATCAGG 

GTACATGAAC 

GAGTGAGCAA 

GTACTTCTTC 

TTCAGTGAAG 

TGTGGATTGT 

CGCCTGTAGA 

GTGGTGTAAG 

TAGGAGGAAA 

CTAGCCTGTG 

AAAAAAAAAA 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1100 



(2) INFORMATION FOR SEQ ID NO: 14: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1588 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 



GAATTCGGCA 
CCATGTTCCC 
TGCCACCGCC 
GGGACAAATC 
AGCTTCACAA 
ATACCTGCTG 
CCTGGGCGTG 
CTTCCGGGGC 
CCTCTTCACC 
CACCTTGGGC 
TTTAACCATT 
TGTTGGGATC 
CATCATTACA 
ACCCTGGTCC 
TGGAACCGAG 
GTCACTGGGC 
GCAGCCGGAA 
CATCCTGCTG 
GAATAAGAAG 
AGACTCCACA 
CAGAAGGCCA 
GTGCTAGGAA 
TGGGGCCGGC 
GGGGCTGCCA 
AAACCTTTTT 
TGTATGTATA 
AAAAAAAAAA 



CGAGGGTACC 
TACTCGGCTC 
TATCGGATGA 
GTGGGCCAAG 
AGTGCCAACC 
GCAGCGGGGG 
CGGGAGCAGA 
CTACGGCTGG 
TCCTTGGCTT 
TTCCGCAATG 
CCCATCTGGC 
TCATCAGCAG 
TATGCGGTAG 
ATGCTGCCTG 
CCCATCTTCT 
ATTTCTACCC 
CGTGTCAAGT 
GGCCTGCTGC 
GCCCTGCAGG 
GAGCTGGCTA 
CAGAAGGGAT 
GGGAACTGAA 
TGCTCTGTGG 
CTGTGAATAT 
TTTACAGAGC 
TGTCTGTGAG 
AAAAATTCCT 



TGCTTTTCTA 
TCACCATGTT 
CTGTGGAAGT 
CAGACACGCC 
ATACACATGG 
TCATTGTCTG 
GAGAACCCTA 
TCATGAGCCA 
TCATGCTGGT 
AATTCCAGAA 
AGTGGTTCTT 
TGCCATTTCT 
CTGTGGCAGC 
ATGTCATTGA 
TCTCCTTCTA 
TCAGTCTGGA 
TTACACTGAA 
TCTTCAAAAT 
CACTGAGGGA 
GCATCCTCTA 
CAGGACCTGT 
GACTCAAGGA 
CCTCCTGCCT 
GCCAAGGACT 
CTAATTAATA 
CTATTAATGT 
GCGGCCGC i 



TTGCCTCTTT 
CATCAGCACC 
GCTGGGCACA 
TTGTTTCCAG 
CACCACCTCA 
TATCTATATA 
TGAAGCCCAG 
CGGCCCATAC 
GGAGGGGAAC 
TCTACTCCTG 
GACCCGGTTT 
CATCTTGGTG 
TGGCATCAGT 
CGACTTCCAT 
TGTCTTCTTC 
CTTTGCAGGG 
CATGCTCGTG 
GTACCCCATT 
CGAGGCCAGC 
GGGCCCGCCA 
CTGCCGGCTT 
GGTGGCCCAG 
CCCCTCTGCC 
GATCGGGCCT 
ACTTAATGAC 
TATTAATTTT 



GAAACAATGG 
GAGCAGACTG 
GTGCTGGGCA 
GACCTCAATA 
CACAGGGAAA 
ATCTGTGCTG 
CAGTCTGAGC 
ATCAAACTTA 
TTTGTCTTGT 
GCCATCATGC 
GGCAAGAAGA 
GCCCTCATGG 
GTGGCAGCTG 
CTGAAGCAGC 
ACCAAGTTTG 
TACCAGACCC 
ACCATGGCTC 
GATGAGGAGA 
AGCTCTGGCT 
CGTTGCCCGA 
GCTGAGCAGC 
GACACTTGCT 
TGCCTGTGGG 
AGCCCGGAAC 
TGTGTACATA 
CATAAAAGCT 



TCACGTGTTT 
AGCGGGATTC 
CGGCGATCCA 
GCTCTACAGT 
CGCAAAAGGC 
TCATCCTGAT 
CAATCGCCTA 
TTACTGGCTT 
TTTGCACCTA 
TCTCGGCCAC 
CAGCTGTATA 
AGAGTAACCT 
CCTTCTTACT 
CCCACTTCCA 
CCTCTGGAGT 
GTGGCTGCTC 
CCATAGTTCT 
GGCGGCGGCA 
GCTCAGAAAC 
AGCCACCATG 
TGGACTGCAG 
GTGCTCACTG 
GCCAAGCCCT 
ACTAATGTAG 
GCAATGTGTG 
GGAAAGCAAA 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1588 



(2) INFORMATION FOR SEQ ID NO: 15: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1535 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) 


SEQUENCE DESCRIPTION: 


SEQ ID NO: 15 i 




GAATTCGGCA 


CGAGGCGGAA 


GTCCCGTCTC 


ACGGTTGCCC TGGCAGCGCG CGAGGCTGGT 


60 


GAGTCGGCAG 


CCCTGTGGCA 


GCCGGCGGGC 


TGGTTTCCAT GGTTGCACGA TTAGGAACCA 


120 


CCAGCTGCTG 


CATCCCATGG 


CCAGGGGTGG 


CGTCCAGGTG GCAGAGCAGC TAGGAACGCA 


180 


AGGCCTGAAC 


CTGGGGCCAG 


ACACCCTGCT 


CTCCCGGCCA TGGTCAACGA CCCTCCAGTA 


240 


CCTGCCTTAC 


TGTGGGCCCA 


GGAGGTGGGC 


CAAGTCTTGG CAGGCCGTGC CCGCAGGCTG 


300 


CTGCTGCAGT 


TTGGGGTGCT 


CTTCTGCACC 


ATCCTCCTTT TGCTCTGGGT GTCTGTCTTC 


360 


CTCTATGGCT 


CCTTCTACTA 


TTCCTATATG 


CCGACAGTCA GCCACCTCAG CCCTGTGCAT 


420 


TTCTACTACA GGACCGACTG 


TGATTCCTCC 


ACCACCTCAC TCTGCTCCTT CCCTGTTGCC 


480 


AATGTCTCGC 


TGACTAAGGG 


TGGACGTGAT 


CGGGTGCTGA TGTATGGACA GCCGTATCGT 


540 


GTTACCTTAG 


AGCTTGAGCT 


GCCAGAGTCC 


CCTGTGAATC AAGATTTGGG CATGTTCTTG 


600 


GTCACCATTT 


CCTGCTACAC 


CAGAGGTGGC 


CGAATCATCT CCACTTCTTC GCGTTCGGTG 


660 


ATGCTGCATT 


ACCGCTCAGA 


CCTGCTCCAG 


ATGCTGGACA CACTGGTCTT CTCTAGCCTC 


720 


CTGCTATTTG 


GCTTTGCAGA 


GCAGAAGCAG 


CTGCTGGAGG TGGAACTCTA CGCAGACTAT 


780 


AGAGAGAACT 


CGTACGTGCC 


GACCACTGGA 


GCGATCATTG AGATCCACAG CAAGCGCATC 


840 


CAGCTGTATG 


GAGCCTACCT 


CCGCATCCAC 


GCGCACTTCA CTGGGCTCAG ATACCTGCTA 


900 


TACAACTTCC 


CGATGACCTG 


CGCCTTCATA 


GGTGTTGCCA GCAACTTCAC CTTCCTCAGC 


960 


GTCATCGTGC 


TCTTCAGCTA 


CATGCAGTGG 


GTGTGGGGGG GCATCTGGCC CCGACACCGC 


1020 


TTCTCTTTGC 


AGGTTAACAT 


CCGAAAAAGA 


GACAATTCCC GGAAGGAAGT CCAACGAAGG 


1080 


ATCTCTGCTC 


ATCAGCCAGG 


GCCTGAAGGC 


CAGGAGGAGT CAACTCCGCA ATCAGATGTT 


1140 


ACAGAGGATG 


GTGAGAGCCC 


TGAAGATCCC 


TCAGGGACAG AGGTCAGCTG TCCGAGGAGG 


1200 


AGAAACCAGA 


TCAGCAGCCC 


CTGAGCGGAG 


AAGAGGAGCT AGAGCCTGAG GCCAGTGATG 


1260 


GTTCAGGCTC 


CTGGGAAGAT 


GCAGCTTTGC 


TGACGGAGGC CAACCTGCCT GCTCCTGCTC 


1320 


CTGCTTCTGC 


TTCTGCCCCT 


GTCCTAGAGA 


CTCTGGGCAG CTCTGAACCT GCTGGGGGTG 


1380 


CTCTCCGACA GCGCCCCACC TGCTCTAGTT, CCTGAAGAAA AG_GGGCAGAC TCCTCACATT 


1440 


CCAGCACTTT 


CCCACCTGAC 


TCCTCTCCCC 


TCGTTTTTCC TTCAATAAAC TATTTTGTGT 


1500 


CAAAAAAAAA 


AAAAAAAAAA 


AATTCCTGCG 


GCCGC 


1535 
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(2) INFORMATION FOR SEQ ID NO: 16 J 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1322 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

GAATTCGGCA CGAGGGCGGG CGCTACGGGC TTGACTCCCC CAAGGCCGAG GTCCGCGGCC 60 

AGGTGCTGGC GCCGCTGCCC CTCCACGGAG TTGCTGATCA TCTGGGCTGT GATCCACAAA 120 

CCCGGTTCTT TGTCCCTCCT AATATCAAAC AGTGGATTGC CTTGCTGCAG AGGGGAAACT 180 

GCACGTTTAA AGAGAAAATA TCACGGGCCG CTTTCCACAA TGCAGTTGCT GTAGTCATCT 240 

ACAATAATAA ATCCAAAGAG GAGCCAGTTA CCATGACTCA TCCAGGCACT GGAGATATTA 300 

TTGCTGTCAT GATAACAGAA TTGAGGGGTA AGGATATTTT GAGTTATCTG GAGAAAAACA 360 

TCTCTGTACA AATGACAATA GCTGTTGGAA CTCGAATGCC ACCGAAGAAC TTCAGCCGTG 420 

GCTCTCTAGT CTTCGTGTCA ATATCCTTTA TTGTTTTGAT GATTATTTCT TCAGCATGGC 480 

TCATATTCTA CTTCATTCAA AAGATCAGGT ACACAAATGC ACGCGACAGG AACCAGCGTC 540 

GTCTCGGAGA TGCAGCCAAG AAAGCCATCA GTAAATTGAC AACCAGGACA GTAAAGAAGG 600 

GTGACAAGGA AACTGACCCA GACTTTGATC ATTGTGCAGT CTGCATAGAG AGCTATAAGC 660 

AGAATGATGT CGTCCGAATT CTCCCCTGCA AGCATGTTTT CCACAAATCC TGCGTGGAXC 720 

CCTGGCTTAG TGAACATTGT ACCTGTCCTA TGTGCAAACT TAATATATTG AAGGCCCTGG 780 

GAATTGTGCC GAATTTGCCA TGTACTGATA ACGTAGCATT CGATATGGAA AGGCTCACCA 840 

GAACCCAAGC TGTTAACCGA AGATCAGCCC TCGGCGACCT CGCCGGCGAC AACTCCCTTG 900 

GCCTTGAGCC ACTTCGAACT TCGGGGATCT CACCTCTTCC TCAGGATGGG GAGCTCACTC 960 

CGAGAACAGG AGAAATCAAC ATTGCAGTAA CAAAAGAATG GTTTATTATT GCCAGTTTTG 1020 

GCCTCCTCAG TGCCCTCACA CTCTGCTACA TGATCATCAG AGCCACAGCT AGCTTGAATG 1080 

CTAATGAGGT AGAATGGTTT TGAAGAAGAA AAAACCTGCT TTCTGACTGA TTTTGCCTTG 1140 

AAGGAAAAAA GAACCTATTT TTGTGCATCA TTTACCAATC ATGCCACACA AGCATTTATT 1200 

TTTAGTACAT TTTATTTTTT CATAAAATTG CTAATGCCAA AGCTTTGTAT TAAAAGAAAT 1260 

AAATAATAAA ATAAAAAAAA AAAAAAAAAA k AAAAAAAAAA AAAAAAAAAT TCCTGCGGCC 1320 

GC 1322 

(2) INFORMATION FOR SEQ ID NO: 17: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 1711 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17 : 

GAATTCGGCA CGAGGCCCTC CCGCGCTCCC GGGGCGCGCG GGCCGCGCCC CCGACGCCCT 60 

ACATATACTC AGGTGCGCCC CACCTGTCCG CCCGCACCTG CTGGCTCACC TCCGAGCCAC 120 

CTCTGCTGCG CACCGCAGCC TCGGACCTAC AGCCCAGGAT ACTTTGGGAC TTGCCGGCGC 180 

TCAGAAACGC GCCCAGACGG CCCCTCCACC TTTTGTTTGC CTAGGGTCGC CGAGAGCGCC 240 

CGGAGGGAAC CGCCTGGCCT TCGGGGACCA CCAATTTTGT CTGGAACCAC CCTCCCGGCG 300 

TATCCTACTC CCTGTGCCGC GAGGCCATCG CTTCACTGGA GGGGTCGATT TGTGTGTAGT 360 

TTGGTGACAA GATTTGCATT CACCTGGCCC AAACCCTTTT TGTCTCTTTG GGTGACCGGA 420 

AAACTCCACC TCAAGTTTTC TTTTGTGGGG CTGCCCCCCA AGTGTCGTTT GTTTTACTGT 480 

AGGGTCTCCC GCCCGGCGCC CCCAGTGTTT TCTGAGGGCG GAAATGGCCA ATTCGGGCCT 540 

GCAGTTG CTG GGCTTCTCCA TGGCCCTGCT GGGCTGGGTG GGTCTGGTGG CCTGCACCGC 600 

CATCCCGCAG TGGCAGATGA GCTCCTATGC GGGTGACAAC ATCATCACGG CCCAGGCCAT 660 

GTACAAGGGG CTGTGGATGG ACTGCGTCAC GCAGAGCACG GGGATGATGA GCTGCAAAAT 720 

GTACGACTCG GTGCTCGCCC TGTCCGCGGC CTTGCAGGCC ACTCGAGCCC TAATGGTGGT 780 

CTCCCTGGTG CTGGGCTTCC TGGCCATGTT TGTGGCCACG ATGGGCATGA AGTGCACGCG 840 

CTGTGGGGGA GACGACAAAG TGAAGAAGGC CCGTATAGCC ATGGGTGGAG GCATAATTTT 900 

CATCGTGGCA GGTCTTGCCG CCTTGGTAGC TTGCTCCTGG TATGGCCATC AGATTGTCAC 960 

AGACTTTTAT AACCCTTTGA TCCCTACCAA CATTAAGTAT GAGTTTGGCC CTGCCATCTT 1020 

TATTGGCTGG GCAGGGTCTG CCCTAGTCAT CCTGGGAGGT GCACTGCTCT CCTGTTCCTG 1080 

TCCTGGGAAT GAGAGCAAGG CTGGGTACCG TGCACCCCGC TCTTACCCTA AGTCCAACTC 1140 

TTCCAAGGAG TATGTGTGAC CTGGG AT CTC CTTGCCCCAG CCTGACAGGC TATGGGAGTG 1200 

TCTAGATGCC TGAAAGGGCC TGGGGCTGAG CTCAGCCTGT GGGCAGGGTG CCGGACAAAG 1260 

GCCTCCTGGT CACTCTGTCC CTGCACTCCA TGTATAGTCC TCTTGGGTTG GGGGTGGGGG 1320 

GGTGCCGTTG GTGGGAGAGA CAAAAAGAGG GAGAGTGTGC TTTTTGTACA GTAATAAAAA 1380 

ATAAGTATTG GGAAGCAGGC TTTTTTCCCT*, TCAGGGCCTC TGCTTTCCTC CCGTCCAGAT 1440 

CCTTGCAGGG AGCTTGGAAC CTTAGTGCAC CTACTTCAGT TCAGAACACT TAGCACCCCA 1500 

CTGACTCCAC TGACAATTGA CTAAAAGATG CAGGTGCTCG TATCTCGACA TTCATTCCCA 1560 

CCCCCCTCTT ATTTAAATAG CTACCAAAGT ACTTCTTTTT TAATAAAAAA ATAAAGATTT 1620 
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TTATTAGGTA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 1680 
AAAAAAAAAA AAAAAAAATT CCTGCGGCCG C 1711 

(2) INFORMATION FOR SEQ ID NO; 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1553 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

GAATTCGGCA CGAGGGCAGG TCCAGAGTAA AGTCACTGAA GAGTGGAAGC GAGGAAGGAA 60 

CAGGATGATT AGACCTCAGC TGCGGACCGC GGGGCTGGGA CGATGCCTCC TGCCGGGGCT 120 

GCTGCTGCTC CTGGTGCCCG TCCTCTGGGC CGGGGCTGAA AAGCTACATA CCCAGCCCTC 180 

CTGCCCCGCG GTCTGCCAGC CCACGCGCTG CCCCGCGCTG CCCACCTGCG CGCTGGGGAC 240 

CACGCCGGTG TTCGACCTGT GCCGCTGTTG CCGCGTCTGC CCCGCGGCCG AGCGTGAAGT 300 

CTGCGGCGGG GCGCAGGGCC AACCGTGCGC CCCGGGGCTG CAGTGCCTCC AGCCGCTGCG 360 

CCCCGGGTTC CCCAGCACCT GCGGTTGCCC GACGCTGGGA GGGGCCGTGT GCGGCAGCGA 420 

CAGGCGCACC TACCCCAGCA TGTGCGCGCT CCGGGCCGAA AACCGCGCCG CGCGCCGCCT 480 

GGGCAAGGTC CCGGCCGTGC CTGTGCAGTG GGGGAACTGC GGGGATACAG GGACCAGAAG 540 

CGCAGGCCCG CTCAGGAGGA ATTACAACTT CATCGCCGCG GTGGTGGAGA AGGTGGCGCC 600 

ATCGGTGGTT CACGTGCAGC TGTGGGGCAG GTTACTTCAC GGCAGCAGGC TTGTTCCTGT 660 

GTACAGTGGC TCTGGGTTCA TAGTGTCTGA GGACGGGCTC ATTATTACCA ATGCCCATGT 720 

TGTCAGGAAC CAGCAGTGGA TTGAGGTGGT GCTCCAGAAT GGGGCCCGTT ATGAAGCTGT 780 

TGTCAAGGAT ATTGACCTTA AATTGGATCT TGCGGTGATT AAGATTGAAT CAAATGCTGA 840 

ACTTCCTGTA CTGATGCTGG GAAGATCATC TGACCTTCGG GCTGGAGAGT TTGTGGTGGC 900 

TTTGGGCAGC CCATTTTCTC TGCAGAACAC AGCTACTGCA GGAATTGTCA GCACCAAACA 960 

GCGAGGGGGC AAAGAACTGG GGATGAAGGA TTCAGATATG GACTACGTCC AGATTGATGC 1020 

CACAATTAAC TATGGGAATT CTGGTGGTCC TCTGGTGAAC TTGGATGGTG ATGTGATTGG 1O80 

CGTCAATTCA TTGAGGGTGA CTGATGGAAT^CTCCTTTGCA ATTCCTTCAG ATCGAGTTAG 1140 

GCAGTTCTTG GCAGAATACC ATGAGCACCA GATGAAAGGA AAGGCGTTTT CAAATAAGAA 1200 

ATATCTGGGT CTGCAAATGC TGTCCCTCAC TGTGCCCCTT AGTGAAGAAT TGAAAATGCA 1260 

TTATCCAGAT TTCCCTGATG TGAGTTCTGG GGTTTATGTA TGTAAAGTGG TTGAAGGAAC 1320 
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AGCTGCTCAA AGCTCTGGAT TGAGAGATCA CGATGTAATT GTCAACATAA ATGGGAAACC 1380 

TATTACTACT ACAACTGATG TTGTTAAAGC TCTTGACAGT GATTCCCTTT CCATGGCTGT 1440 

TCTTCGGGGA AAAGATAATT TGCTCCTGAC AGTCATACCT GAAACAATCA ATTAAATATC 1500 

TTGTTTTAAA GTGGGATTAT CTAAAAAAAA AAAAAAAAAA TTCCTGCGGC CGC I553 

(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1596 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19 s 



GAATTCGGCA 


CGAGGGGAGC 


CGCTCCCGGA GCCCGGCCGT AGAGGCTGCA ATCGCAGCCG 


60 


GGAGCCCGCA 


GCCCGCGCCC 


CGAGCCCGCC 


GCCGCCCTTC 


GAGGGCGCCC CAGGCCGCGC 


120 


CATGGTGAAG 


GTGACGTTCA 


ACTCCGCTCT 


GGCCCAGAAG GAGGCCAAGA AGGACGAGCC 


180 


CGAGAGCGGC 


GAGGAGGCGC 


TCATCATCCC 


CCCCGACGCC 


GTCGCGGTGG ACTGCAAGGA 


240 


CCCAGATGAT 


GTGGTACCAG 


TTGGCCAAAG 


AAGAGCCTGG 


TGTTGGTGCA TGTGCTTTGG 


300 


ACTAGCATTT 


ATGCTTGCAG 


GTGTTATTCT 


AGGAGGAGCA 


TACTTGTACA AATATTTTGC 


360 


ACTTCAACCA 


GATGACGTGT 


ACTACTGTGG 


AATAAAGTAC 


ATCAAAGATG ATGTCATCTT 


420 


AAATGAGCCC 


TCTGCAGATG 


CCCCAGCTGC 


TCTCTACCAG 


ACAATTGAAG AAAATATTAA 


480 


AATCTTTGAA 


GAAGAAGAAG 


TTGAATTTAT 


CAGTGTGCCT 


GTCCCAGAGT TTGCAGATAG 


540 


TGATCCTGCC 


AACATTGTTC 


ATGACTTTAA 


CAAGAAACTT 


ACAGCCTATT TAGATCTTAA 


600 


CCTGGATAAG 


TGCTATGTGA 


TCCCTCTGAA 


CACTTCCATT 


GTTATGCCAC CCAGAAACCT 


660 


ACTGGAGTTA 


CTTATTAACA 


TCAAGGCTGG 


AACCTATTTG 


CCTCAGTCCT ATCTGATTCA 


720 


TGAGCACATG 


GTTATTACTG 


ATCGCATTGA 


AAACATTGAT 


CACCTGGGTT TCTTTATTTA 


780 


TCGACTGTGT 


CATGACAAGG 


AAACTTACAA 


ACTGCAACGC 


AGAGAAACTA TTAAAGGTAT 


840 


TCAGAAACGT 


GAAGCCAGCA 


ATTGTTTCGC 


AATTCGGCAT 


TTTGAAAACA AATTTGCCGT 


900 


GGAAACTTTA 


ATTTGTTCTT 


gaacagtcaa\ GAAAAACATT 


ATTGAGGAAA ATTAATATCA 


960 


CAGCATAACC 


CCACCCTTTA 


CATTTTGTGC 


AGTGATATTT 


TTTAAAGTCT CTTTCATGTA 


1020 


AGTAGCAAAC 


AGGGCTTTAC 


TATCTTTTCA 


TCTCATTAAT 


TCAATTAAAA CCATTACCTT 


1080 


AAAATTTTTT 


TCTTTCGAAG 


TGTGGTGTCT 


TTTATATTTG 


AATTAGTAAC TGTATGAAGT 


1140 
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CATAGATAAT AGTACATGTC ACCTTAGGTA GTAGGAAGAA TTACAATTTC TTTAAATCAT 1200 

TTATCTGGAT TTTTATGTTT TATTAGCATT TTCAAGAAGA CGGATTATCT AGAGAATAAT 1260 

CATATATATG CATACGTAAA AATGGACCAC AGTGACTTAT TTGTAGTTGT TAGTTGCCCT 1320 

GCTACCTAGT TTGTTAGTGC ATTTGAGCAC ACATTXT^AT TTTCCTCTAA TTAAAATGTG 1380 

CAGTATTTTC AGTGTCAAAT ATATTTAACT ATTTAGAGAA TGATTTCCAC CTTTATGTTT 1440 

TAATATCCTA GGCATCTGCT GTAATAATAT TTTAGAAAAT GTTTGGAATT TAAGAAATAA 1500 

CTTGTGTTAC TAATTTGTAT AACCCATATC TGTGCAATGG AATATAAATA TCACAAAGTT 1560 

GTTTAAAAAA AAAAAAAAAA AAATTCCTGC GGCCGC l596 

(2) INFORMATION FOR SEQ ID NO: 20: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 400 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE : None 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 



Met Ala Trp Arg Arg Arg Glu Ala Gly Val Gly Ala Arg Gly Val Leu 

1 5 io i 5 

Ala Leu Ala Leu Leu Ala Leu Ala Leu Cys Val Pro Gly Ala Arg Gly 

20 25 30 

Arg Ala Leu Glu Trp Phe Ser Ala Val Val Asn He Glu Tyr Val Asp 

35 40 45 

Pro Gin Thr Asn Leu Thr Val Trp Ser Val Ser Glu Ser Gly Arg Phe 

50 55 60 

Gly Asp Ser Ser Pro Lys Glu Gly Ala His Gly Leu Val Gly Val Pro 
65 70 75 so 

Trp Ala Pro Gly Gly Asp Leu Glu Gly Cys Ala Pro Asp Thr Arg Phe 

85 v 90 95 

Phe Val Pro Glu Pro Gly Gly Arg Gly Ala Ala Pro Trp Val Ala Leu 

100 los no 

Val Ala Arg Gly Gly Cys Thr Phe Lys Asp Lys Val Leu Val Ala Ala 
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115 120 125 

Arg Arg Asn Ala Ser Ala Val Val Leu Tyr Asn Glu Glu Arg Tyr Gly 

130 135 140 

Asn He Thr Leu Pro Met Ser His Ala Gly Thr Gly Asn He Val Val 
145 150 155 160 

lie Met He Ser Tyr Pro Lys Gly Arg Glu He Leu Glu Leu Val Gin 

165 170 175 

Lys Gly He Pro Val Thr Met Thr He Gly Val Gly Thr Arg His Val 

180 185 190 

Gin Glu Phe He Ser Gly Gin Ser Val Val Phe Val Ala He Ala Phe 

195 200 205 

He Thr Met Met He He Ser Leu Ala Trp Leu He Phe Tyr Tyr He 

210 215 220 

Gin Arg Phe Leu Tyr Thr Gly Ser Gin He Gly Ser Gin Ser His Arg 
225 230 235 240 

Lys Glu Thr Lys Lys Val He Gly Gin Leu Leu Leu His Thr Val Lys 

245 250 255 

His Gly Glu Lys Gly He Asp Val Asp Ala Glu Asn Cys Ala Val Cys 

260 265 270 

He Glu Asn Phe Lys Val Lys Asp He He Arg He Leu Pro Cys Lys 

275 280 285 

His He Phe His Arg He Cys He Asp Pro Trp Leu Leu Asp His Arg 

290 295 300 

Thr Cys Pro Met Cys Lys Leu Asp Val He Lys Ala Leu Gly Tyr Trp 
3 °5 310 315 320 

Gly Glu Pro Gly Asp Val Gin Glu Met Pro Ala Pro Glu Ser Pro Pro 

325 330 335 

Gly Arg Asp Pro Ala Ala Asn Leu Ser Leu Ala Leu Pro Asp Asp Asp 

340 345 350 

Gly Ser Asp Asp Ser Ser Pro Pro Ser Ala Ser Pro Ala Glu Ser Glu 

355 360 365 

Pro Gin Cys Asp Pro Ser Phe Lys Gly Asp Ala Gly Glu Asn Thr Ala 

370 375 \ 38Q 

Leu Leu Glu Ala Gly Arg Ser Asp Ser Arg His Gly Gly Pro He Ser 
385 390 395 400 
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(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 291 amino acids 

(B) TYPE $ amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: None 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

Met Asp Lys Gly Ser Ala Gly His Pro Gly Gly Val Leu Val Trp Gly 

1 5 ao 15 

Arg Ser Pro Ala Pro Thr Ala Leu Trp Gly Ala Ser Pro Trp Leu Ser 

20 25 30 

Pro Leu Thr Ser Ala Leu Arg Gin Pro Leu His Arg Ala Pro Leu Leu 

3 5 40 45 

Pro Gly Gin Leu Cys Trp Ser Pro Arg Pro Leu Glu Lys Asn Lys Ala 

50 55 60 

Met Gly Arg Pro Leu Leu Leu Pro Leu Leu Leu Leu Leu Gin Pro Pro 
65 70 75 80 

Ala Phe Leu Gin Pro Gly Gly Ser Thr Gly Ser Gly Pro Ser Tyr Leu 

85 90 95 

Tyr Gly Val Thr Gin Pro Lys His Leu Ser Ala Ser Met Gly Gly Ser 

100 105 no 

Val Glu He Pro Phe Ser Phe Tyr Tyr Pro Trp Glu Leu Ala He Val 

115 120 125 

Pro Asn Val Arg He Ser Trp Arg Arg Gly His Phe His Gly Gin Ser 

130 135 140 

Phe Tyr Ser Thr Arg Pro Pro Ser He His Lys Asp Tyr Val Asn Arg 
145 150 155 160 

Leu Phe Leu Asn Trp Thr Glu Gly Gin Glu Ser Gly Phe Leu Arg He 

165 170 175 

Ser Asn Leu Arg Lys Glu Asp Gin Ser Val Tyr Phe Cys Arg Val Glu 
180 185 190 
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Leu Asp Thr Arg Arg Ser Gly Arg 
195 200 
Thr Lys Leu Thr He Thr Gin Ala 

210 215 
Pro Ser Ser Thr Thr Thr He Ala 
225 230 
Gly His Ser Glu Ser Trp His Leu 
245 

Ala Leu Ala Val Ala Val Leu Lys 
260 

Leu Leu Leu Leu Trp Trp Arg Arg 
275 280 
Ser Asp Phe 
290 



Gin Gin Leu Gin Ser He Lys Gly 

205 

Val Thr Thr Thr Thr Thr Trp Arg 
220 

Gly Leu Arg Val Thr Glu Ser Lys 
235 240 
Ser Leu Asp Thr Ala He Arg Val 

250 255 
Thr Val He Leu Gly Leu Leu Cys 
265 270 
Arg Lys Gly Ser Arg Ala Pro Ser 
285 



(2) INFORMATION FOR SEQ ID NO: 22: 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 293 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: None 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: 

Met Thr Val Ser Gin Arg Phe Gin Leu Ser Asn Ser Gly Pro Asn Ser 

1 5 io 15 

Thr He Lys Met Lys He Ala Leu Arg Val Leu His Leu Glu Lys Arg 

20 25 30 

Glu Arg Pro Pro Asp His Gin His Ser Ala Gin Val Lys Arg Pro Ser 

35 40 \ 45 

Val Ser Lys Glu Gly Arg Lys Thr Ser He Lys Ser His Met Ser Gly 

50 55 60 

Ser Pro Gly Pro Gly Gly Ser Asn Thr Ala Pro Ser Thr Pro Val He 
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65 70 75 80 

Gly Gly Ser Asp Lys Pro Gly Met Glu Glu Lys Ala Gin Pro Pro Glu 

85 go 95 

Ala Gly Pro Gin Gly Leu His Asp Leu Gly Arg Ser Ser Ser Ser Leu 

100 xos no 

Leu Ala ser Pro Gly His lie Ser Val Lys Glu Pro Thr Pro Ser He 

115 120 125 

Ala Ser Asp He Ser Leu Pro He Ala Thr Gin Glu Leu Arg Gin Arg 

130 135 140 

Leu Arg Gin Leu Glu Asn Gly Thr Thr Leu Gly Gin Ser Pro Leu Gly 
145 150 155 leo 

Gin lie Gin Leu Thr He Arg His Ser Ser Gin Arg Asn Lys Leu He 

165 170 i 75 

Val Val Val His Ala Cys Arg Asn Leu He Ala Phe Ser Glu Asp Gly 

180 185 190 

Ser Asp Pro Tyr Val Arg Met Tyr Leu Leu Pro Asp Lys Arg Arg Ser 

i95 200 205 

Gly Arg Arg Lys Thr His Val Ser Lys Lys Thr Leu Asn Pro Val Phe 

210 215 220 

Asp Gin Ser Phe Asp Phe Ser Val ser Leu Pro Glu Val Gin Arg Arg 
225 «° 235 240 

Thr Leu Asp Val Ala Val Lys Asn Ser Gly Gly Phe Leu Ser Lys Asp 

245 250 255 

Lys Gly Leu Leu Gly Lys Val Leu Val Ala Leu Ala Ser Glu Glu Leu 

260 265 270 

Ala Lys Gly Trp Thr Gin Trp Tyr Asp Leu Thr Glu Asp Gly Thr Arg 

27 5 280 285 

Pro Gin Ala Met Thr 
290 



(2) INFORMATION FOR SEQ ID NOs23s 

(i) SEQUENCE CHARACTERISTICS s 

(A) LENGTH: 206 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 



51 



WO 98/25959 



PCT/US97/22787 



(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE : None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: 

Met Glu Arg Arg His Pro Val Cys Ser Gly Thr Cys Gin Pro Thr Gin 

1 5 10 15 

Phe Arg Cys Ser Asn Gly Cys Cys lie Asp Ser Phe Leu Glu Cys Asp 

20 25 30 

Asp Thr Pro Asn Cys Pro Asp Ala Ser Asp Glu Ala Ala Cys Glu Lys 

35 40 45 

Tyr Thr Ser Gly Phe Asp Glu Leu Gin Arg He His Phe Pro Ser Asp 
50 55 60 

Lys Gly His Cys Val Asp Leu Pro Asp Thr Gly Leu Cys Lys Glu Ser 
65 70 75 80 

He Pro Arg Trp Tyr Tyr Asn Pro Phe Ser Glu His Cys Ala Arg Phe 

85 90 95 

Thr Tyr Gly Gly Cys Tyr Gly Asn Lys Asn Asn Phe Glu Glu Glu Gin 

100 105 HO 

Gin Cys Leu Glu Ser Cys Arg Gly He Ser Lys Lys Asp Val Phe Gly 

115 120 125 

Leu Arg Arg Glu He Pro He Pro Ser Thr Gly Ser Val Glu Met Ala 

130 135 140 

Val Ala Val Phe Leu Val He Cys He Val Val Val Val Ala He Leu 
145 150 155 160 

Gly Tyr Cys Phe Phe Lys Asn Gin Arg Lys Asp Phe His Gly His His 

165 170 175 

His His Pro Pro Pro Thr Pro Ala Ser Ser Thr Val Ser Thr Thr Glu 

180 185 190 

Asp Thr Glu His Leu Val Tyr Asn His Thr Thr Arg Pro Leu 
195 200 205 

(2) INFORMATION FOR SEQ ID NO: 24 J 

(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 220 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: None 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 



Met Ala Gly Leu Ser Arg Gly Ser Ala Arg Ala Leu Leu Ala Ala Leu 

1 5 10 15 

Leu Ala Ser Thr Leu Leu Ala Leu Leu Val Ser Pro Ala Arg Gly Arg 

20 25 30 

Gly Gly Arg Asp His Gly Asp Trp Asp Glu Ala Ser Arg Leu Pro Pro 

35 40 45 

Leu Pro Pro Arg Glu Asp Ala Ala Arg Val Ala Arg Phe Val Thr His 

50 55 60 

Val Ser Asp Trp Gly Ala Leu Ala Thr lie Ser Thr Leu Glu Ala Val 
65 70 75 80 

Arg Gly Arg Pro Phe Ala Asp Val Leu Ser Leu Ser Asp Gly Pro Pro 

85 90 95 

Gly Ala Gly Ser Gly Val Pro Tyr Phe Tyr Leu Ser Pro Leu Gin Leu 

100 105 no 

Ser Val Ser Asn Leu Gin Glu Asn Pro Tyr Ala Thr Leu Thr Met Thr 

115 120 125 

Leu Ala Gin Thr Asn Phe Cys Lys Lys His Gly Phe Asp Pro Gin Ser 

130 135 140 

Pro Leu Cys Val His He Met Leu Ser Gly Thr Val Thr Lys Val Asn 
145 150 155 160 

Glu Thr Glu Met Asp He Ala Lys His Ser Leu Phe He Arg His Pro 

165 170 175 

Glu Met Lys Thr Trp Pro Ser Ser His Asn Trp Phe Phe Ala Lys Leu 

180 }85 190 

Asn He Thr Asn He Trp Val Leu Asp Tyr Phe Gly Gly Pro Lys He 

195 200 205 

Val Thr Pro Glu Glu Tyr Tyr Asn Val Thr Val Gin 
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210 -j it: 

215 220 

(2) INFORMATION FOR SEQ ID NO s 25s 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTHS 197 amino acids 

(B) TYPEs amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY s linear 

(ii) MOLECULE TYPES None 

<xi) SEQUENCE DESCRIPTIONS SEQ ID NOs25s 

Met Asp His His Cys Pro Trp Leu Asn Asn Cys Val Gly His Tyr Asn 

Hxs Arg Tyr Phe Phe Ser Phe Cys Phe Phe Met Thr Leu Gly Cys Val 

20 25 30 

Tyr Cys Ser Tyr Gly Ser Trp Asp Leu Phe Arg Glu Ala Tyr Ala Ala 

35 40 45 

He Glu Lys Met Lys Gin Leu Asp Lys Asn Lys Leu Gin Ala Val Ala 



50 55 60 



Asn Gin Thr Tyr His Gin Thr Pro Pro Pro Thr Phe Ser Phe Arg Glu 
" 70 75 80 

Arg Met Thr His Lys Ser Leu Val Tyr Leu Trp Phe Leu Cys Ser Ser 

85 90 95 

Val Ala Leu Ala Leu Gly Ala Leu Thr Val Trp His Ala Val Leu He 

100 105 110 

Ser Arg Gly Glu Thr Ser He Glu Arg His He Asn Lys Lys Glu Arg 

115 120 125 

Arg Arg Leu Gin Ala Lys Gly Arg Val Phe Arg Asn Pro Tyr Asn Tyr 

130 "5 140 

Oly Cys Leu Asp Asn Trp Lys Val "phe Leu Gly Val Asp Thr Gly Arg 

150 155 160 

Hxs Trp Leu Thr Arg Val Leu Leu Pro Ser Thr His Leu Pro His Gly 



165 I 70 175 
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Asn Gly Met Ser Trp Glu Pro Pro Pro Trp Val Thr Ala Hia Ser Ala 

180 185 190 

Ser Val Met Ala Val 

195 

(2) INFORMATION FOR SEQ ID NO: 26* 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTHS 451 amino acids 

(B) TYPES amino acid 

(C) STRANDED NESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 

Met Ala Pro Leu Gly Met Leu Leu Gly Leu Leu Met Ala Ala Cys Phe 

1 5 io ls 

Thr Phe Cys Leu Ser Hia Gin Asn Leu Lys Glu Phe Ala Leu Thr Asn 

20 25 30 

Pro Glu Lys Ser Ser Thr Lys Glu Thr Glu Arg Lys Glu Thr Lys Ala 

35 40 45 

Glu Glu Glu Leu Asp Ala Glu Val Leu Glu Val Phe His Pro Thr His 

50 55 60 

Glu Trp Gin Ala Leu Gin Pro Gly Gin Ala Val Pro Ala Gly Ser His 
65 70 75 80 

Val Arg Leu Asn Leu Gin Thr Gly Glu Arg Glu Ala Lys Leu Gin Tyr 

85 90 95 

Glu Asp Lys Phe Arg Asn Asn Leu Lys Gly Lys Arg Leu Asp lie Asn 

100 105 no 

Thr Asn Thr Tyr Thr Ser Gin Asp Leu Lys Ser Ala Leu Ala Lys Phe 

115 120 ; 125 

Lys Glu Gly Ala Glu Met Glu Ser Ser Lya Glu Asp Lys Ala Arg Gin 

130 135 140 

Ala Glu Val Lys Arg Leu Phe Arg Pro He Glu Glu Leu Lys Lys Aap 
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145 150 155 160 

Phe Asp Glu Leu Asn Val Val He Glu Thr Asp Met Gin He Met Val 

165 170 175 

Arg Leu He Asn Lys Phe Asn Ser Ser Ser Ser Ser Leu Glu Glu Lys 

180 185 190 

He Ala Ala Leu Phe Asp Leu Glu Tyr Tyr Val His Gin Met Asp Asn 

195 200 205 

Ala Gin Asp Leu Leu Ser Phe Gly Gly Leu Gin Val Val He Asn Gly 

210 215 220 

Leu Asn Ser Thr Glu Pro Leu Val Lys Glu Tyr Ala Ala Phe Val Leu 
225 230 235 2 40 

Gly Ala Ala Phe Ser Ser Asn Pro Lys Val Gin Val Glu Ala He Glu 

245 250 255 

Gly Gly Ala Leu Gin Lys Leu Leu Val He Leu Ala Thr Glu Gin Pro 

260 265 270 

Leu Thr Ala Lys Lys Lys Val Leu Phe Ala Leu Cys Ser Leu Leu Arg 

275 280 285 

His Phe Pro Tyr Ala Gin Arg Gin Phe Leu Lys Leu Gly Gly Leu Gin 

290 295 300 

Val Leu Arg Thr Leu Val Gin Glu Lys Gly Thr Glu Val Leu Ala Val 
305 310 315 320 

Arg Val Val Thr Leu Leu Tyr Asp Leu Val Thr Glu Lys Met Phe Ala 

325 330 335 

Glu Glu Glu Ala Glu Leu Thr Gin Glu Met Ser Pro Glu Lys Leu Gin 

340 345 350 

Gin Tyr Arg Gin Val His Leu Leu Pro Gly Leu Trp Glu Gin Gly Trp 

355 360 365 

Cys Glu He Thr Ala His Leu Leu Ala Leu Pro Glu His Asp Ala Arg 

370 375 380 

Glu Lys Val Leu Gin Thr Leu Gly Val Leu Leu Thr Thr Cys Arg Asp 
385 390 395 40Q 

Arg Tyr Arg Gin Asp Pro Gin Leu Gly Arg Thr Leu Ala Ser Leu Gin 

405 \ 410 415 

Ala Glu Tyr Gin Val Leu Ala Ser Leu Glu Leu Gin Asp Gly Glu Asp 

420 425 430 

Glu Gly Tyr Phe Gin Glu Leu Leu Gly Ser Val Asn Ser Leu Leu Lys 
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435 

Glu Leu Arg 
450 



440 



445 



(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 254 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: None 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 

Met Trp Gin Ala Gly Lys Arg Gin Ala Ser Arg Ala Phe Ser Leu Tyr 

Ala Asn He Asp He Leu Arg Pro Tyr Phe Asp Val Glu Pro Ala Gin 

20 25 30 

Val Arg ser Arg Leu Leu Glu Ser Met He Pro He Lys Met Val Asn 

35 40 45 

Phe Pro Gin Lys He Ala Gly Glu Leu Tyr Gly Pro Leu Met Leu Val 

50 55 60 

Phe Thr Leu Val Ala He Leu Leu His Gly Met Lys Thr Ser Asp Thr 
55 70 75 80 

He He Arg Glu Gly Thr Leu Met Gly Thr Ala He Gly Thr Cys Phe 

85 90 95 

Gly Tyr Trp Leu Gly Val Ser Ser Phe He Tyr Phe Leu Ala Tyr Leu 

100 105 110 

Cys Asn Ala Gin He Thr Met Leu Gin Met Leu Ala Leu Leu Gly Tyr 

115 120 125 

Gly Leu Phe Gly His Cys He Val Leu Phe He Thr Tyr Asn He His 

130 135 140 

Leu His Ala Leu Phe Tyr Leu Phe Trp Leu Leu Val Gly Gly Leu Ser 
145 150 155 160 
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Thr Leu Arg Met Val Ala Val Leu 
165 

Gin Arg Leu Leu Leu Cys Gly Thr 
180 

Leu Leu Tyr Leu His Phe Ala Tyr 
195 200 
Asp Thr Leu Glu Gly Pro Asn He 

210 215 
Asp lie Pro Ala Met Leu Pro Ala 
225 230 
Asn Ala Thr Ala Lys Ala Val Ala 
245 



Val Ser Arg Thr Val Gly Pro Thr 

170 175 
Leu Ala Ala Leu His Met Leu Phe 
185 190 
His Lys Val Val Glu Gly He Leu 
205 

Pro Pro He Gin Arg Val Pro Arg 

220 

Ala Arg Leu Pro Thr Thr Val Leu 
235 240 
Val Thr Leu Gin Ser His 
250 



(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 221 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY s linear 



(ii) MOLECULE TYPE: None 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 

Met Gly Ser Glu Asn Glu Ala Leu Asp Leu Ser Met Lys Ser Val Pro 

1 5 io is 

Trp Leu Lys Ala Gly Glu Val Ser Pro Pro He Phe Gin Glu Asp Ala 

20 25 30 

Ala Leu Asp Leu Ser Val Ala Ala His Arg Lys Ser Glu Pro Pro Pro 

35 40 45 

Glu Thr Leu Tyr Asp Ser Gly Ala Ser Val Asp Ser Ser Gly His Thr 

50 55 \ 60 

Val Met Glu Lys Leu Pro Ser Gly Met Glu He ser Phe Ala Pro Ala 
65 70 75 80 

Thr Ser His Glu Ala Pro Ala Met Met Asp Ser His He Ser Ser Ser 
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85 90 95 

Asp Ala Ala Thr Glu Met Leu Ser Gin Pro Asn His Pro Ser Gly Glu 

100 105 no 

Val Lys Ala Glu Asn Asn lie Glu Met Val Gly Glu Ser Gin Ala Ala 

115 120 125 

Lys Val lie val Ser Val Glu Asp Ala Val Pro Thr He Phe Cys Gly 
130 las 

140 

Lys He Lys Gly Leu Ser Gly Val Ser Thr Lys Asn Phe Ser Phe Lys 
145 150 155 i 6 o 

Arg Glu Asp Ser Val Leu Gin Gly Tyr Asp He Asn Ser Gin Gly Glu 

165 170 175 

Glu Ser Met Gly Asn Ala Glu Pro Leu Arg Lys Pro He Lys Asn Arg 

180 185 190 

Ser He Lys Leu Lys Lys Val Asn Ser Gin Glu Val His Met Leu Pro 

195 200 205 

He Lys Lys Gin Arg Leu Ala Thr Phe Phe Pro Arg Lys 
210 215 220 

(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 266 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECOXE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 

Met val Lys Val Thr Phe Asn Ser Ala Leu Ala Gin Lys Glu Ala Lys 

1 5 io i 5 

Lys Asp Glu Pro Lys Ser Gly Glu Glu Ala Leu lie He Pro Pro Asp 

20 25 30 

Ala Val Ala Val Asp Cys Lys Asp Pro Asp Asp Val Val Pro Val Gly 
35 40 45 
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Gin Arg Arg Ala Trp Cys Trp Cys Met Cys Phe Gly Leu Ala Phe Met 

50 55 60 

Leu Ala Gly Val He Leu Gly Gly Ala Tyr Leu Tyr Lys Tyr Phe Ala 
65 70 75 80 

Leu Gin Pro Asp Asp Val Tyr Tyr Cys Gly He Lys Tyr He Lys Asp 

85 90 95 

Asp Val He Leu Asn Glu Pro Ser Ala Asp Ala Pro Ala Ala Leu Tyr 

100 105 HO 

Gin Thr He Glu Glu Asn He Lys He Phe Glu Glu Glu Glu Val Glu 

115 120 125 

Phe He Ser Val Pro Val Pro Glu Phe Ala Asp Ser Asp Pro Ala Asn 

130 135 140 

He Val His Asp Phe Asn Lys Lys Leu Thr Ala Tyr Leu Asp Leu Asn 
I 45 150 155 160 

Leu Asp Lys Cys Tyr Val He Pro Leu Asn Thr Ser He Val Met Pro 

165 170 175 

Pro Arg Asn Leu Leu Glu Leu Leu He Asn He Lys Ala Gly Thr Tyr 

180 185 190 

Leu Pro Gin Ser Tyr Leu He His Glu His Met Val He Thr Asp Arg 

195 200 205 

He Glu Asn He Asp His Leu Gly Phe Phe He Tyr Arg Leu Cys His 

210 215 220 

Asp Lys Glu Thr Tyr Lys Leu Gin Arg Arg Glu Thr He Lys Gly He 
225 230 235 240 

Gin Lys Arg Glu Ala Ser Asn Cys Phe Ala He Arg His Phe Glu Asn 

245 250 255 

Lys Phe Ala Val Glu Thr Leu He Cys Ser 
260 265 

(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 251 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



60 



WO 98/25959 PCT/US97/22787 



(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION J SEQ ID NO $30: 

Met Pro Thr Gly Asp Phe Asp Ser Lys Pro Ser Trp Ala Asp Gin Val 

15 10 15 

Glu Glu Glu Gly Glu Asp Asp Lys Cys Val Thr Ser Glu Leu Leu Lys 

20 25 30 

Gly He Pro Leu Ala Thr Gly Asp Thr Ser Pro Glu Pro Glu Leu Leu 

35 40 45 

Pro Gly Ala Pro Leu Pro Pro Pro Lys Glu Val He Asn Gly Asn He 

50 55 60 

Lys Thr Val Thr Glu Tyr Lys He Asp Glu Asp Gly Lys Lys Phe Lys 
65 70 75 80 

He Val Arg Thr Phe Arg He Glu Thr Arg Lye Ala Ser Lys Ala Val 

85 90 95 

Ala Arg Arg Lys Asn Trp Lys Lys Phe Gly Asn Ser Glu Phe Asp Pro 

100 105 no 

Pro Gly Pro Asn Val Ala Thr Thr Thr Val Ser Asp Asp Val Ser Met 

115 120 125 

Thr Phe He Thr Ser Lys Glu Asp Leu Asn Cys Gin Glu Glu Glu Asp 

130 135 140 

Pro Met Asn Lys Phe Lys Gly Gin Lys He Val Ser Cys Arg He Cys 
145 150 155 160 

Lys Gly Asp His Trp Thr Thr Arg Cys Pro Tyr Lys Asp Thr Leu Gly 

165 170 175 

Pro Met Gin Lys Glu Leu Ala Glu Gin Leu Gly Leu Ser Thr Gly Glu 

180 185 190 

Lys Glu Lys Leu Pro Gly Glu Leu Glu Pro Val Gin Ala Thr Gin Asn 

i95 200 205 

Lys Thr Gly Lys Tyr Val Pro Pro Ser Leu Arg Asp Gly Ala Ser Arg 

21° 215 220 

Arg Gly Glu Ser Met Gin Pro Asn Arg Arg Ala Asp Asp Asn Ala Thr 
225 230 235 240 

He Arg Val Thr Asn Leu Arg Arg Gly His Ala 
245 250 
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(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 377 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: None 



<xi) SEQUENCE DESCRIPTION; SEQ ID NO: 31: 

Met Arg Arg Leu Asn Arg Lys Lys Thr Leu Ser Leu Val Lys Glu Leu 

1 5 10 15 

Asp Ala Phe Pro Lys Val Pro Glu Ser Tyr Val Glu Thr Ser Ala Ser 

20 25 30 

Gly Gly Thr Val Ser Leu lie Ala Phe Thr Thr Met Ala Leu Leu Thr 

35 40 45 

lie Met Glu Phe Ser Val Tyr Gin Asp Thr Trp Met Lys Tyr Glu Tyr 

50 55 60 

Glu Val Asp Lys Asp Phe Ser Ser Lys Leu Arg lie Asn He Asp He 
6 5 70 75 80 

Thr Val Ala Met Lys Cys Gin Tyr Val Gly Ala Asp Val Leu Asp Leu 

85 90 95 

Ala Glu Thr Met Val Ala Ser Ala Asp Gly Leu Val Tyr Glu Pro Thr 

100 105 HO 

Val Phe Asp Leu Ser Pro Gin Gin Lys Glu Trp Gin Arg Met Leu Gin 

115 120 125 

Leu He Gin Ser Arg Leu Gin Glu Glu His Ser Leu Gin Asp Val lie 

130 135 140 

Phe Lys Ser Ala Phe Lys Ser Thr Ser Thr Ala Leu Pro Pro Arg Glu 
145 150 155 160 

Asp Asp Ser Ser Gin ser Pro Asn'Ala Cys Arg lie His Gly His Leu 

165 170 175 

Tyr Val Asn Lys Val Ala Gly Asn Phe His lie Thr Val Gly Lys Ala 
180 185 190 
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He Pro His Pro Arg Gly His Ala His Leu Ala Ala Leu Val Asn His 

135 200 205 

Glu Ser Tyr Asn Phe Ser His Arg He Asp His Leu Ser Phe Gly Glu 

210 215 220 

Leu Val Pro Ala He He Asn Pro Leu Asp Gly Thr Glu Lys He Ala 
225 230 235 240 

He Asp His Asn Gin Met Phe Gin Tyr Phe He Thr Val Val Pro Thr 

245 250 255 

Lys Leu His Thr Tyr Lys He Ser Ala Asp Thr His Gin Phe Ser Val 

260 265 270 

Thr Glu Arg Glu Arg He He Asn His Ala Ala Gly Ser His Gly Val 

275 280 285 

Ser Gly He Phe Met Lys Tyr Asp Leu Ser Ser Leu Met Val Thr Val 

290 295 300 

Thr Glu Glu His Met Pro Phe Trp Gin Phe Phe Val Arg Leu Cys Gly 
305 310 315 320 

He Val Gly Gly He Phe Ser Thr Thr Gly Met Leu His Gly He Gly 

325 330 335 

Lys Phe He Val Glu He He Cys Cys Arg Phe Arg Leu Gly Ser Tyr 

340 345 350 

Lys Pro Val Asn Ser Val Pro Phe Glu Asp Gly His Thr Asp Asn His 

355 360 365 

Leu Pro Leu Leu Glu Asn Asn Thr His 
370 375 

(2) INFORMATION FOR SEQ ID NO:32: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 250 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

i 

(ii) MOLECULE TYPE: None 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 
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Met Gly Ser Gin His Ser Ala Ala Ala Arg Pro Ser Ser Cys Arg Arg 

15 10 15 

Lys Gin Glu Asp Asp Arg Asp Gly Leu Leu Ala Glu Arg Glu Gin Glu 

20 25 30 

Glu Ala lie Ala Gin Phe Pro Tyr Val Glu Phe Thr Gly Arg Asp Ser 

35 40 45 

lie Thr Cys Leu Thr Cys Gin Gly Thr Gly Tyr He Pro Thr Glu Gin 

50 55 60 

Val Asn Glu Leu Val Ala Leu He Pro His Ser Asp Gin Arg Leu Arg 
65 70 75 80 

Pro Gin Arg Thr Lys Gin Tyr Val Leu Leu Ser He Leu Leu Cys Leu 

85 90 95 

Leu Ala Ser Gly Leu Val Val Phe Phe Leu Phe Pro His Ser Val Leu 

100 105 HO 

Val Asp Asp Asp Gly He Lys Val Val Lys Val Thr Phe Asn Lys Gin 

115 120 125 

Asp Ser Leu Val He Leu Thr He Met Ala Thr Leu Lys He Arg Asn 

130 135 140 

Ser Asn Phe Tyr Thr Val Ala Val Thr Ser Leu Ser Ser Gin He Gin 
145 150 155 160 

Tyr Met Asn Thr Val Val Ser Thr Tyr Val Thr Thr Asn Val Ser Leu 

165 170 175 

He Pro Pro Arg Ser Glu Gin Leu Val Asn Phe Thr Gly Lys Ala Glu 

180 185 190 

Met Gly Gly Pro Phe Ser Tyr Val Tyr Phe Phe Cys Thr Val Pro Glu 

195 200 205 

He Leu Val His Asn He Val He Phe Met Arg Thr Ser Val Lys He 

210 215 220 

Ser Tyr He Gly Leu Met Thr Gin Ser Ser Leu Glu Thr His His Tyr 
225 230 235 240 

Val Asp Cys Gly Gly Asn Ser Thr Ala He 
245 250 

(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 374 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 

Met Val Thr Cys Phe His Val Pro Tyr Ser Ala Leu Thr Met Phe lie 

1 5 10 is 

Ser Thr Glu Gin Thr Glu Arg Asp Ser Ala Thr Ala Tyr Arg Met Thr 

20 25 30 

Val Glu Val Leu Gly Thr Val Leu Gly Thr Ala He Gin Gly Gin He 

35 40 45 

Val Gly Gin Ala Asp Thr Pro Cys Phe Gin Asp Leu Asn Ser Ser Thr 

50 55 60 

Val Ala Ser Gin Ser Ala Asn His Thr His Gly Thr Thr Ser His Arg 
65 70 75 80 

Glu Thr Gin Lys Ala Tyr Leu Leu Ala Ala Gly Val He Val Cys He 

85 90 95 

Tyr He He Cys Ala Val He Leu He Leu Gly Val Arg Glu Gin Arg 

100 105 110 

Glu Pro Tyr Glu Ala Gin Gin Ser Glu Pro He Ala Tyr Phe Arg Gly 

115 120 125 

Leu Arg Leu Val Met Ser His Gly Pro Tyr He Lys Leu He Thr Gly 

130 135 140 

Phe Leu Phe Thr Ser Leu Ala Phe Met Leu Val Glu Gly Asn Phe Val 
I 45 150 155 160 

Leu Phe Cys Thr Tyr Thr Leu Gly Phe Arg Asn Glu Phe Gin Asn Leu 

155 170 175 

Leu Leu Ala He Met Leu Ser Ala Thr Leu Thr He Pro He Trp Gin 

180 185 190 

Trp Phe Leu Thr Arg Phe Gly Lys Lys Thr Ala Val Tyr Val Gly He 

195 200 205 

Ser Ser Ala Val Pro Phe Leu He Leu Val Ala Leu Met Glu Ser Asn 
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210 215 
Leu lie lie Thr Tyr Ala Val 
225 230 
Ala Ala Phe Leu Leu Pro Trp 
245 

Phe His Leu Lys Gin Pro His 
260 

Ser Phe Tyr Val Phe Phe Thr 
275 

lie Ser Thr Leu Ser Leu Asp 
290 295 
Ser Gin Pro Glu Arg Val Lys 
305 310 
Ala Pro lie Val Leu He Leu 
325 

Pro He Asp Glu Glu Arg Arg 
340 

Leu Arg Asp Glu Ala Ser Ser 
355 

Glu Leu Ala Ser He Leu 
370 

(2) INFORMATION FOR SEQ ID NO? 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 334 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:34:„ 

Met Val Asn Asp Pro Pro Val Pro Ala Leu Leu Trp Ala Gin Glu Val 
15 10 15 

66 



220 

Ala Val Ala Ala Gly He Ser Val Ala 
235 240 
Ser Met Leu Pro Asp Val He Asp Asp 

250 255 
Phe His Gly Thr Glu Pro He Phe Phe 

265 270 
Lys Phe Ala Ser Gly Val Ser Leu Gly 
280 285 
Phe Ala Gly Tyr Gin Thr Arg Gly Cys 
300 

Phe Thr Leu Asn Met Leu Val Thr Met 
315 320 
Leu Gly Leu Leu Leu Phe Lys Met Tyr 

330 335 
Arg Gin Asn Lys Lys Ala Leu Gin Ala 

345 350 
Ser Gly Cys Ser Glu Thr Asp Ser Thr 
360 365 
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Gly Gin Val Leu Ala Gly Arg Ala Arg Arg Leu Leu Leu Gin Phe Gly 

20 25 30 

Val Leu Phe Cys Thr He Leu Leu Leu Leu Trp Val Ser Val Phe Leu 

35 40 45 

Tyr Gly Ser Phe Tyr Tyr Ser Tyr Met Pro Thr Val Ser His Leu Ser 

50 55 60 

Pro Val His Phe Tyr Tyr Arg Thr Asp Cys Asp Ser Ser Thr Thr Ser 
65 70 75 80 

Leu Cys Ser Phe Pro Vai Ala Asn Val Ser Leu Thr Lys Gly Gly Arg 

85 90 95 

Asp Arg Val Leu Met Tyr Gly Gin Pro Tyr Arg Val Thr Leu Glu Leu 

100 105 110 

Glu Leu Pro Glu Ser Pro Val Asn Gin Asp Leu Gly Met Phe Leu Val 

115 120 125 

Thr He Ser Cys Tyr Thr Arg Gly Gly Arg He He Ser Thr Ser Ser 

130 135 140 

Arg Ser Val Met Leu His Tyr Arg Ser Asp Leu Leu Gin Met Leu Asp 
145 150 155 160 

Thr Leu Val Phe Ser Ser Leu Leu Leu Phe Gly Phe Ala Glu Gin Lys 

165 170 175 

Gin Leu Leu Glu Val Glu Leu Tyr Ala Asp Tyr Arg Glu Asn Ser Tyr 

180 185 190 

Val Pro Thr Thr Gly Ala He He Glu He His Ser Lys Arg He Gin 

195 200 205 

Leu Tyr Gly Ala Tyr Leu Arg He His Ala His Phe Thr Gly Leu Arg 

210 215 220 

Tyr Leu Leu Tyr Asn Phe Pro Met Thr Cys Ala Phe tie Gly Val Ala 
225 230 235 240 

Ser Asn Phe Thr Phe Leu Ser Val He Val Leu Phe Ser Tyr Met Gin 

245 250 255 

Trp Val Trp Gly Gly He Trp Pro Arg His Arg Phe Ser Leu Gin Val 

260 265 270 

Asn He Arg Lys Arg Asp Asn Ser Arg Lys Glu Val Gin Arg Arg He 

275 280 285 

Ser Ala His Gin Pro Gly Pro Glu Gly Gin Glu Glu Ser Thr Pro Gin 
290 295 300 
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Ser Asp Val Thr Glu Asp Gly Glu Ser Pro Glu Asp Pro Ser Gly Thr 
305 310 315 320 

Glu Val Ser Cys Pro Arg Arg Arg Asn Gin lie Ser Ser Pro 
325 330 



<2> INFORMATION FOR SEQ ID NOx35: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 276 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: None 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35$ 



Met Thr His Pro Gly Thr Gly Asp He lie Ala Val Met He Thr Glu 

15 10 15 

Leu Arg Gly Lys Asp He Leu Ser Tyr Leu Glu Lys Asn He Ser Val 

20 25 30 

Gin Met Thr He Ala Val Gly Thr Arg Met Pro Pro Lys Asn Phe Ser 

35 40 45 

Arg Gly Ser Leu Val Phe Val Ser He Ser Phe He Val Leu Met He 

50 55 60 

He Ser Ser Ala Trp Leu He Phe Tyr Phe He Gin Lys He Arg Tyr 
65 70 75 80 

Thr Asn Ala Arg Asp Arg Asn Gin Arg Arg Leu Gly Asp Ala Ala Lys 

85 90 95 

Lys Ala He Ser Lys Leu Thr Thr Arg Thr Val Lys Lys Gly Asp Lys 

100 105 HO 

Glu Thr Asp Pro Asp Phe Asp His Cys Ala Val Cys He Glu Ser Tyr 

115 120 ' 125 

Lys Gin Asn Asp Val Val Arg He Leu Pro Cys Lys His Val Phe His 

130 135 140 

Lys Ser Cys Val Asp Pro Trp Leu Ser Glu His Cys Thr Cys Pro Met 
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145 



150 



155 



160 



Cys Lys Leu Asn He Leu Lys Ala Leu Gly He Val Pro Asn Leu Pro 

165 170 175 

Cys Thr Asp Asn Val Ala Phe Asp Met Glu Arg Leu Thr Arg Thr Gin 

180 185 190 

Ala Val Asn Arg Arg Ser Ala Leu Gly Asp Leu Ala Gly Asp Asn Ser 

195 200 205 

Leu Gly Leu Glu Pro Leu Arg Thr Ser Gly He Ser Pro Leu Pro Gin 

210 215 220 

Asp Gly Glu Leu Thr Pro Arg Thr Gly Glu He Asn He Ala Val Thr 
225 230 235 240 

Lys Glu Trp Phe He He Ala Ser Phe Gly Leu Leu Ser Ala Leu Thr 

245 250 255 

Leu Cys Tyr Met He He Arg Ala Thr Ala Ser Leu Asn Ala Asn Glu 

260 265 270 

Val Glu Trp Phe 



(2) INFORMATION FOR SEQ ID NO: 36; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 210 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 

Met Ala Asn Ser Gly Leu Gin Leu Leu Gly Phe Ser Met Ala Leu Leu 

15 10 15 

Gly Trp Val Gly Leu Val Ala Cys Vhr Ala He Pro Gin Trp Gin Met 



Ser Ser Tyr Ala Gly Asp Asn He He Thr Ala Gin Ala Met Tyr Lys 



275 



20 



25 



30 



35 



40 



45 
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Gly Leu Trp Met Asp Cys VaX Thr Gin Ser Thr Gly Met Met Ser Cys 

50 55 60 

Lys Met Tyr Asp Ser Val Leu Ala Leu Ser Ala Ala Leu Gin Ala Thr 
65 70 75 80 

Arg Ala Leu Met Val Val Ser Leu Val Leu Gly Phe Leu Ala Met Phe 

85 90 95 

Val Ala Thr Met Gly Met Lys Cys Thr Arg Cys Gly Gly Asp Asp Lys 

100 105 no 

Val Lys Lys Ala Arg lie Ala Met Gly Gly Gly He lie Phe He Val 

115 120 125 

Ala Gly Leu Ala Ala Leu Val Ala Cys Ser Trp Tyr Gly His Gin He 

130 135 140 

Val Thr Asp Phe Tyr Asn Pro Leu He Pro Thr Asn He Lys Tyr Glu 
1^5 150 155 160 

Phe Gly Pro Ala He Phe He Gly Trp Ala Gly Ser Ala Leu Val He 

165 170 175 

Leu Gly Gly Ala Leu Leu Ser Cys Ser Cys Pro Gly Asn Glu Ser Lys 

180 185 190 

Ala Gly Tyr Arg Ala Pro Arg Ser Tyr Pro Lys Ser Asn Ser Ser Lys 
195 200 205 

Glu Tyr 
210 

(2) INFORMATION FOR SEQ ID NO:37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 476 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:37: 
Met He Arg Pro Gin Leu Arg Thr Ala Gly Leu Gly Arg Cys Leu Leu 
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1 5 io 15 

Pro Gly Leu Leu Leu Leu Leu Val Pro Val Leu Trp Ala Gly Ala Glu 

20 25 30 

Lys Leu His Thr Gin Pro Ser Cys Pro Ala Val Cys Gin Pro Thr Arg 

35 40 45 

Cys Pro Ala Leu Pro Thr Cys Ala Leu Gly Thr Thr Pro Val Phe Asp 

50 55 60 

Leu Cys Arg Cys Cys Arg Val Cys Pro Ala Ala Glu Arg Glu Val Cye 
65 70 75 80 

Gly Gly Ala Gin Gly Gin Pro Cys Ala Pro Gly Leu Gin Cys Leu Gin 

85 90 95 

Pro Leu Arg Pro Gly Phe Pro Ser Thr Cys Gly Cys Pro Thr Leu Gly 

100 105 110 

Gly Ala Val Cys Gly Ser Asp Arg Arg Thr Tyr Pro Ser Met Cys Ala 

115 120 125 

Leu Arg Ala Glu Asn Arg Ala Ala Arg Arg Leu Gly Lys Val Pro Ala 

130 135 140 

Val Pro Val Gin Trp Gly Asn Cys Gly Asp Thr Gly Thr Arg Ser Ala 
145 150 155 160 

Gly Pro Leu Arg Arg Asn Tyr Asn Phe lie Ala Ala Val Val Glu Lys 

165 170 175 

Val Ala Pro Ser Val Val His Val Gin Leu Trp Gly Arg Leu Leu His 

180 185 190 

Gly Ser Arg Leu Val Pro Val Tyr Ser Gly Ser Gly Phe lie Val Ser 

195 200 205 

Glu Asp Gly Leu He He Thr Asn Ala His Val Val Arg Asn Gin Gin 

210 215 220 

Trp He Glu Val Val Leu Gin Asn Gly Ala Arg Tyr Glu Ala Val Val 
225 230 235 240 

Lys Asp He Asp Leu Lys Leu Asp Leu Ala Val He Lys He Glu Ser 

245 250 255 

Asn Ala Glu Leu Pro Val Leu Met Leu Gly Arg Ser Ser Asp Leu Arg 

260 J65 270 

Ala Gly Glu Phe Val Val Ala Leu Gly Ser Pro Phe Ser Leu Gin Asn 

275 280 285 

Thr Ala Thr Ala Gly He Val Ser Thr Lys Gin Arg Gly Gly Lys Glu 
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290 295 300 

Leu Gly Met Lys Asp Ser Asp Met Asp Tyr Val Gin He Asp Ala Thr 
305 310 315 320 

He Asn Tyr Gly Asn Ser Gly Gly Pro Leu Val Asn Leu Asp Gly Asp 

325 330 335 

Val He Gly Val Asn Ser Leu Arg Val Thr Asp Gly He Ser Phe Ala 

340 345 350 

He Pro Ser Asp Arg Val Arg Gin Phe Leu Ala Giu Tyr His Glu His 

355 360 365 

Gin Met Lys Gly Lys Ala Phe Ser Asn Lys Lys Tyr Leu Gly Leu Gin 

370 375 380 

Met Leu Ser Leu Thr Val Pro Leu Ser Glu Glu Leu Lys Met His Tyr 
385 390 395 400 

Pro Asp Phe Pro Asp Val Ser Ser Gly Val Tyr Val Cys Lys Val Val 

405 410 415 

Glu Gly Thr Ala Ala Gin Ser Ser Gly Leu Arg Asp His Asp Val He 

420 425 430 

Val Asn He Asn Gly Lys Pro He Thr Thr Thr Thr Asp Val Val Lys 

435 440 445 

Ala Leu Asp Ser Asp Ser Leu Ser Met Ala Val Leu Arg Gly Lys Asp 

450 455 460 

Asn Leu Leu Leu Thr Val He Pro Glu Thr He Asn 
465 470 475 



(2) INFORMATION FOR SEQ ID NO: 38 : 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 266 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None ' 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:38: 
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Met Val Lys Val Thr Phe Asn Ser Ala Leu Ala Gin Lys Glu Ala Lys 

15 10 15 

Lys Asp Glu Pro Glu Ser Gly Glu Glu Ala Leu lie He Pro Pro Asp 

20 25 30 

Ala Val Ala Val Asp Cys Lys Asp Pro Asp Asp Val Val Pro Val Gly 

35 40 45 

Gin Arg Arg Ala Trp Cys Trp Cys Met Cys Phe Gly Leu Ala Phe Met 

50 55 60 

Leu Ala Gly Val He Leu Gly Gly Ala Tyr Leu Tyr Lys Tyr Phe Ala 
65 70 75 80 

Leu Gin Pro Asp Asp Val Tyr Tyr Cys Gly He Lys Tyr He Lys Asp 

85 90 95 

Asp Val He Leu Asn Glu Pro Ser Ala Asp Ala Pro Ala Ala Leu Tyr 

100 105 110 

Gin Thr He Glu Glu Asn He Lys He Phe Glu Glu Glu Glu Val Glu 

115 120 125 

Phe He Ser Val Pro Val Pro Glu Phe Ala Asp Ser Asp Pro Ala Asn 

130 135 140 

He Val His Asp Phe Asn Lys Lys Leu Thr Ala Tyr Leu Asp Leu Asn 
145 150 155 160 

Leu Asp Lys Cys Tyr Val He Pro Leu Asn Thr Ser He Val Met Pro 

165 170 175 

Pro Arg Asn Leu Leu Glu Leu Leu He Asn He Lys Ala Gly Thr Tyr 

180 185 190 

Leu Pro Gin Ser Tyr Leu He His Glu His Met Val He Thr Asp Arg 

195 200 205 

He Glu Asn He Asp His Leu Gly Phe Phe He Tyr Arg Leu Cys His 

210 215 220 

Asp Lys Glu Thr Tyr Lys Leu Gin Arg Arg Glu Thr He Lys Gly He 
225 230 235 240 

Gin Lys Arg Glu Ala Ser Asn Cys Phe Ala He Arg His Phe Glu Asn 



245 



250 

He pys Ser 



255 



Lys 



Phe 



Ala Val 



Glu 



Thr 



Leu 



260 



265 
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We Claim? 

1. An isolated and purified human protein having an amino acid 
sequence selected from the group consisting of the amino acid sequences shown in 
SEQ ID Nos:20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 
and 38. 

2. An isolated and purified human protein having an amino acid 
sequence which is at least 85% identical to an amino acid sequence selected from 
the group consisting of the amino acid sequences shown in SEQ ID Nos:20, 21, 22, 

23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, and 38. 

3. An isolated and purified human polypeptide comprising at least 6 
contiguous amino acids of an amino acid sequence selected from the group 
consisting of the amino acid sequences shown in SEQ ID Nos:20, 21, 22, 23, 24, 
25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, and 38. 

4. A fusion protein comprising a first protein segment and a second 
protein segment fiised together by means of a peptide bond, wherein the first 
protein segment consists of at least 6 contiguous amino acids selected from the 
group consisting of the amino acid sequences shown in SEQ ID Nos:20, 21, 22, 23, 

24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, and 38. 

5. A preparation of antibodies which specifically bind to the human 
protein of claim 1. 

6. An isolated and purified subgenomic polynucleotide having a 
nucleotide sequence selected from the group consisting of the nucleotide sequences 
shown in SEQ ID NOs:l, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1, 12, 13, 14, 15, 16, 17, 18, 
and 19. 

7. An isolated gene corresponding to a cDNA sequence selected from 
the group consisting of the nucleotide sequences shown in SEQ ID NOs:l, 2, 3, 4, 
5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, and 19. 

8. A DNA construct for expressing all or a portion of a human protein 
having an amino acid sequence selected from the group consisting of the amino acid 
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sequences shown in SEQ ID Nos:20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 
33, 34, 35, 36, 37, and 38, comprising: 
a promoter; and 

a polynucleotide segment encoding at least 6 contiguous amino acids 
of the human protein, wherein the polynucleotide segment is located downstream 
from the promoter, wherein transcription of the polynucleotide segment initiates at 
or 3' to the promoter. 

9. A host cell comprising a DNA construct comprising: 
a promoter; and 

a polynucleotide segment encoding at least 6 contiguous amino acids 
of a human protein having an amino acid sequence selected from the group 
consisting of the amino acid sequences shown in SEQ ID NOs:20, 21, 22, 23, 24, 
25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, and 38, wherein the 
polynucleotide segment is located downstream from the promoter and wherein 
transcription of the polynucleotide segment initiates at or 3' to the promoter. 

10. A homologously recombinant cell having incorporated therein a new 
transcription initiation unit, wherein the new transcription initiation unit comprises 
in 5* to 3* order: 

(a) an exogenous regulatory sequence; 

(b) an exogenous exon; and 

(c) a splice donor site, 

wherein the transcription initiation unit is located upstream to a coding sequence of 
a gene, wherein the gene comprises a nucleotide sequence selected from the group 
consisting of the nucleotide sequences shown in SEQ ID NOs:l, 2, 3, 4, 5, 6, 7, 8, 
9, 10, 11, 12, 13, 14, 15, 16, 17, 18, and 19 and wherein the exogenous regulatory 
sequence controls transcription of the coding sequence of the gene. 

11. A method of procjucing a human protein, comprising the steps of: 
growing a culture of a cell comprising a DNA construct comprising 

(1) a promoter and (2) a polynucleotide segment encoding at least 6 contiguous 
amino acids of a human protein having an amino acid sequence selected from the 
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group consisting of the amino acid sequences shown in SEQ ID NOs:20, 21, 22, 
23, 24, 25, 26, 27, 28, 29, 30, 3 1, 32, 33, 34, 35, 36, 37, and 38, wherein the 
polynucleotide segment is located downstream from the promoter and wherein 
transcription of the polynucleotide segment initiates at or 3' to the promoter; and 
purifying the protein from the culture. 

12. A method of producing a human protein, comprising the steps of: 
growing a culture of a homologously recombinant cell having 

incorporated therein a new transcription initiation unit, wherein the new 
transcription initiation unit comprises in 5 1 to 3 f order: 

(a) an exogenous regulatory sequence; 

(b) an exogenous exon; and 

(c) a splice donor site, 

wherein the transcription initiation unit is located upstream to a coding sequence of 
a gene, wherein the gene comprises a nucleotide sequence selected from the group 
consisting of the nucleotide sequences shown in SEQ ID NOs:l, 2, 3, 4, 5, 6, 7, 8, 
9, 10, 11, 12, 13, 14, 15, 16, 17, 18, and 19 and wherein the exogenous regulatory 
sequence controls transcription of the coding sequence of the gene; and 
purifying the protein from the culture. 

13. A method of identifying a secreted polypeptide which is modified by 
rough microsomes, comprising the steps of: 

transcribing in vitro a population of cDNA molecules whereby a 
population of cRNA molecules is formed; 

translating a first portion of the population of cRNA molecules in 
vitro in the absence of rough microsomes whereby a first population of polypeptides 
is formed; 

translating a second portion of the population of cRNA molecules in vitro in 
the presence of rough microsomes whereby a second population of polypeptides is 
formed; 

comparing the first population of polypeptides with the second 
population of polypeptides; and 
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detecting polypeptide members of the second population which have 
been modified by the rough microsomes. 
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